INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION

Classification of the hoax information needs to be done because hoax contains a misguided and dangerous information. Classification previously is on the email and sms hoax. The classification of the hoax article has not been done. Required feature selection to improve the accuracy of the classifi...

Full description

Saved in:

Bibliographic Details
Main Author:	Rasywir, Errissya
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/51859
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:51859
spelling	id-itb.:518592020-10-21T08:52:48ZINDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION Rasywir, Errissya Indonesia Theses hoax; artikel hoax; model fitur; feature selection; classifier; text document classification; union; intersection; k-fold cross validation. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/51859 Classification of the hoax information needs to be done because hoax contains a misguided and dangerous information. Classification previously is on the email and sms hoax. The classification of the hoax article has not been done. Required feature selection to improve the accuracy of the classification hoax article. In this study, collection of Indonesian hoax news is preprocessed then feature selection experiments performed using union and intersection. Type of feature selection which used are information gain, mutual information, chi-square, term frequency and TFxIDF which classified using Naive Bayes, SVM and C4.5 with unigram, bigram and a mixture of both as a model feature. With 220 articles as document collection (89 hoax dan 131 non hoax articles) from 22 topics, where every topic has 10 articles (hoax and non hoax). It has been done 270 testing for feature selection without combination and 360 testing for feature selection with union and intersection combination with parameters such as 3x feature model, 2x stemming test, 2x stopword elimination test, 5x feature selection, 3x classifier dan 3x variant number of feature. The best result was found from combination of feature selection with the union operating between mutual information and information gain of 91.36%. Where only by using information gain alone yielded 90.45%. Meanwhile, by using intersection operations generated value accuracy under both which amounted to 90%. This testing is done with a model 10-fold cross validation. F1 model with the best on incorrect analysis is able to achieve 1 and the lowest is 0.815. These experiments also showed that the probability-based feature selection is better than that based on frequency. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Classification of the hoax information needs to be done because hoax contains a misguided and dangerous information. Classification previously is on the email and sms hoax. The classification of the hoax article has not been done. Required feature selection to improve the accuracy of the classification hoax article. In this study, collection of Indonesian hoax news is preprocessed then feature selection experiments performed using union and intersection. Type of feature selection which used are information gain, mutual information, chi-square, term frequency and TFxIDF which classified using Naive Bayes, SVM and C4.5 with unigram, bigram and a mixture of both as a model feature. With 220 articles as document collection (89 hoax dan 131 non hoax articles) from 22 topics, where every topic has 10 articles (hoax and non hoax). It has been done 270 testing for feature selection without combination and 360 testing for feature selection with union and intersection combination with parameters such as 3x feature model, 2x stemming test, 2x stopword elimination test, 5x feature selection, 3x classifier dan 3x variant number of feature. The best result was found from combination of feature selection with the union operating between mutual information and information gain of 91.36%. Where only by using information gain alone yielded 90.45%. Meanwhile, by using intersection operations generated value accuracy under both which amounted to 90%. This testing is done with a model 10-fold cross validation. F1 model with the best on incorrect analysis is able to achieve 1 and the lowest is 0.815. These experiments also showed that the probability-based feature selection is better than that based on frequency.
format	Theses
author	Rasywir, Errissya
spellingShingle	Rasywir, Errissya INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
author_facet	Rasywir, Errissya
author_sort	Rasywir, Errissya
title	INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
title_short	INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
title_full	INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
title_fullStr	INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
title_full_unstemmed	INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
title_sort	indonesian hoax news classification using feature selection
url	https://digilib.itb.ac.id/gdl/view/51859
_version_	1822928867329835008

INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION

Similar Items