INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
Classification of the hoax information needs to be done because hoax contains a misguided and dangerous information. Classification previously is on the email and sms hoax. The classification of the hoax article has not been done. Required feature selection to improve the accuracy of the classifi...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/51859 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:51859 |
---|---|
spelling |
id-itb.:518592020-10-21T08:52:48ZINDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION Rasywir, Errissya Indonesia Theses hoax; artikel hoax; model fitur; feature selection; classifier; text document classification; union; intersection; k-fold cross validation. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/51859 Classification of the hoax information needs to be done because hoax contains a misguided and dangerous information. Classification previously is on the email and sms hoax. The classification of the hoax article has not been done. Required feature selection to improve the accuracy of the classification hoax article. In this study, collection of Indonesian hoax news is preprocessed then feature selection experiments performed using union and intersection. Type of feature selection which used are information gain, mutual information, chi-square, term frequency and TFxIDF which classified using Naive Bayes, SVM and C4.5 with unigram, bigram and a mixture of both as a model feature. With 220 articles as document collection (89 hoax dan 131 non hoax articles) from 22 topics, where every topic has 10 articles (hoax and non hoax). It has been done 270 testing for feature selection without combination and 360 testing for feature selection with union and intersection combination with parameters such as 3x feature model, 2x stemming test, 2x stopword elimination test, 5x feature selection, 3x classifier dan 3x variant number of feature. The best result was found from combination of feature selection with the union operating between mutual information and information gain of 91.36%. Where only by using information gain alone yielded 90.45%. Meanwhile, by using intersection operations generated value accuracy under both which amounted to 90%. This testing is done with a model 10-fold cross validation. F1 model with the best on incorrect analysis is able to achieve 1 and the lowest is 0.815. These experiments also showed that the probability-based feature selection is better than that based on frequency. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Classification of the hoax information needs to be done because hoax contains a
misguided and dangerous information. Classification previously is on the email and
sms hoax. The classification of the hoax article has not been done. Required feature
selection to improve the accuracy of the classification hoax article.
In this study, collection of Indonesian hoax news is preprocessed then feature
selection experiments performed using union and intersection. Type of feature
selection which used are information gain, mutual information, chi-square, term
frequency and TFxIDF which classified using Naive Bayes, SVM and C4.5 with
unigram, bigram and a mixture of both as a model feature. With 220 articles as
document collection (89 hoax dan 131 non hoax articles) from 22 topics, where
every topic has 10 articles (hoax and non hoax). It has been done 270 testing for
feature selection without combination and 360 testing for feature selection with
union and intersection combination with parameters such as 3x feature model, 2x
stemming test, 2x stopword elimination test, 5x feature selection, 3x classifier dan
3x variant number of feature.
The best result was found from combination of feature selection with the union
operating between mutual information and information gain of 91.36%. Where only
by using information gain alone yielded 90.45%. Meanwhile, by using intersection
operations generated value accuracy under both which amounted to 90%. This
testing is done with a model 10-fold cross validation. F1 model with the best on
incorrect analysis is able to achieve 1 and the lowest is 0.815. These experiments
also showed that the probability-based feature selection is better than that based
on frequency. |
format |
Theses |
author |
Rasywir, Errissya |
spellingShingle |
Rasywir, Errissya INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION |
author_facet |
Rasywir, Errissya |
author_sort |
Rasywir, Errissya |
title |
INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION |
title_short |
INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION |
title_full |
INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION |
title_fullStr |
INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION |
title_full_unstemmed |
INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION |
title_sort |
indonesian hoax news classification using feature selection |
url |
https://digilib.itb.ac.id/gdl/view/51859 |
_version_ |
1822928867329835008 |