INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION

Classification of the hoax information needs to be done because hoax contains a misguided and dangerous information. Classification previously is on the email and sms hoax. The classification of the hoax article has not been done. Required feature selection to improve the accuracy of the classifi...

Full description

Saved in:

Bibliographic Details
Main Author:	Rasywir, Errissya
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/51859
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

Description
Summary:	Classification of the hoax information needs to be done because hoax contains a misguided and dangerous information. Classification previously is on the email and sms hoax. The classification of the hoax article has not been done. Required feature selection to improve the accuracy of the classification hoax article. In this study, collection of Indonesian hoax news is preprocessed then feature selection experiments performed using union and intersection. Type of feature selection which used are information gain, mutual information, chi-square, term frequency and TFxIDF which classified using Naive Bayes, SVM and C4.5 with unigram, bigram and a mixture of both as a model feature. With 220 articles as document collection (89 hoax dan 131 non hoax articles) from 22 topics, where every topic has 10 articles (hoax and non hoax). It has been done 270 testing for feature selection without combination and 360 testing for feature selection with union and intersection combination with parameters such as 3x feature model, 2x stemming test, 2x stopword elimination test, 5x feature selection, 3x classifier dan 3x variant number of feature. The best result was found from combination of feature selection with the union operating between mutual information and information gain of 91.36%. Where only by using information gain alone yielded 90.45%. Meanwhile, by using intersection operations generated value accuracy under both which amounted to 90%. This testing is done with a model 10-fold cross validation. F1 model with the best on incorrect analysis is able to achieve 1 and the lowest is 0.815. These experiments also showed that the probability-based feature selection is better than that based on frequency.

INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION

Similar Items