INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION

Classification of the hoax information needs to be done because hoax contains a misguided and dangerous information. Classification previously is on the email and sms hoax. The classification of the hoax article has not been done. Required feature selection to improve the accuracy of the classifi...

Full description

Saved in:
Bibliographic Details
Main Author: Rasywir, Errissya
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/51859
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:51859
spelling id-itb.:518592020-10-21T08:52:48ZINDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION Rasywir, Errissya Indonesia Theses hoax; artikel hoax; model fitur; feature selection; classifier; text document classification; union; intersection; k-fold cross validation. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/51859 Classification of the hoax information needs to be done because hoax contains a misguided and dangerous information. Classification previously is on the email and sms hoax. The classification of the hoax article has not been done. Required feature selection to improve the accuracy of the classification hoax article. In this study, collection of Indonesian hoax news is preprocessed then feature selection experiments performed using union and intersection. Type of feature selection which used are information gain, mutual information, chi-square, term frequency and TFxIDF which classified using Naive Bayes, SVM and C4.5 with unigram, bigram and a mixture of both as a model feature. With 220 articles as document collection (89 hoax dan 131 non hoax articles) from 22 topics, where every topic has 10 articles (hoax and non hoax). It has been done 270 testing for feature selection without combination and 360 testing for feature selection with union and intersection combination with parameters such as 3x feature model, 2x stemming test, 2x stopword elimination test, 5x feature selection, 3x classifier dan 3x variant number of feature. The best result was found from combination of feature selection with the union operating between mutual information and information gain of 91.36%. Where only by using information gain alone yielded 90.45%. Meanwhile, by using intersection operations generated value accuracy under both which amounted to 90%. This testing is done with a model 10-fold cross validation. F1 model with the best on incorrect analysis is able to achieve 1 and the lowest is 0.815. These experiments also showed that the probability-based feature selection is better than that based on frequency. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Classification of the hoax information needs to be done because hoax contains a misguided and dangerous information. Classification previously is on the email and sms hoax. The classification of the hoax article has not been done. Required feature selection to improve the accuracy of the classification hoax article. In this study, collection of Indonesian hoax news is preprocessed then feature selection experiments performed using union and intersection. Type of feature selection which used are information gain, mutual information, chi-square, term frequency and TFxIDF which classified using Naive Bayes, SVM and C4.5 with unigram, bigram and a mixture of both as a model feature. With 220 articles as document collection (89 hoax dan 131 non hoax articles) from 22 topics, where every topic has 10 articles (hoax and non hoax). It has been done 270 testing for feature selection without combination and 360 testing for feature selection with union and intersection combination with parameters such as 3x feature model, 2x stemming test, 2x stopword elimination test, 5x feature selection, 3x classifier dan 3x variant number of feature. The best result was found from combination of feature selection with the union operating between mutual information and information gain of 91.36%. Where only by using information gain alone yielded 90.45%. Meanwhile, by using intersection operations generated value accuracy under both which amounted to 90%. This testing is done with a model 10-fold cross validation. F1 model with the best on incorrect analysis is able to achieve 1 and the lowest is 0.815. These experiments also showed that the probability-based feature selection is better than that based on frequency.
format Theses
author Rasywir, Errissya
spellingShingle Rasywir, Errissya
INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
author_facet Rasywir, Errissya
author_sort Rasywir, Errissya
title INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
title_short INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
title_full INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
title_fullStr INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
title_full_unstemmed INDONESIAN HOAX NEWS CLASSIFICATION USING FEATURE SELECTION
title_sort indonesian hoax news classification using feature selection
url https://digilib.itb.ac.id/gdl/view/51859
_version_ 1822928867329835008