HIERARCHICAL MULTILABEL CLASSIFICATION USING DISTRIBUTED SEMANTIC MODEL BASED FEATURES FOR INDONESIAN NEWS ARTICLE

<p align="justify">Automatic news categorization is essential to handle multi-variant news articles. This research employs hierarchical multilabel classification to conduct news categorization. Based on our previous research, performance of hierarchical multilabel classification mode...

Full description

Saved in:

Bibliographic Details
Main Author:	CLAIRINE IRSAN - NIM: 23516081 , IVANA
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/28175
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:28175
spelling	id-itb.:281752018-03-16T08:56:04ZHIERARCHICAL MULTILABEL CLASSIFICATION USING DISTRIBUTED SEMANTIC MODEL BASED FEATURES FOR INDONESIAN NEWS ARTICLE CLAIRINE IRSAN - NIM: 23516081 , IVANA Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/28175 <p align="justify">Automatic news categorization is essential to handle multi-variant news articles. This research employs hierarchical multilabel classification to conduct news categorization. Based on our previous research, performance of hierarchical multilabel classification model needs to be improved. There are several things that could potentially improve hierarchical multilabel classificationÃ‚Â’s performance. First method is by using deep learning classifier to classify news at parent level, in this case, CNN will be used to build the classifier. Second, by using word vectorÃ‚Â’s average from word embedding, and the third method is by combining wordÃ‚Â’s term frequency with wordÃ‚Â’s vector average to build features that will be used to train the multilabel classifiers. Based on the result of this experiment, best performance was 75.31%, achieved by building Calibrated Label Ranking Ã‚Â– NaÃƒÂ¯ve Bayes model, and representing document by multiplying wordÃ‚Â’s term frequency with wordÃ‚Â’s vector average. This configuration improved multilabel classification performance by 4.25%, compared to the previous result. The distributed semantic model that contributed to achieve best performance was 300 dimension word2vec that was trained using WikipediaÃ‚Â’s articles. Moreover, multilabel classification model is also influenced by newsÃ‚Â’ release date. If the train data and the test data were collected from different time range, it would decrease modelÃ‚Â’s performance. This could be seen in this experimentÃ‚Â’s results, as the modelÃ‚Â’s performance was decreased when 5635 data from latest timestamp were added as train data.<p align="justify"> <br /> text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	<p align="justify">Automatic news categorization is essential to handle multi-variant news articles. This research employs hierarchical multilabel classification to conduct news categorization. Based on our previous research, performance of hierarchical multilabel classification model needs to be improved. There are several things that could potentially improve hierarchical multilabel classificationÃ‚Â’s performance. First method is by using deep learning classifier to classify news at parent level, in this case, CNN will be used to build the classifier. Second, by using word vectorÃ‚Â’s average from word embedding, and the third method is by combining wordÃ‚Â’s term frequency with wordÃ‚Â’s vector average to build features that will be used to train the multilabel classifiers. Based on the result of this experiment, best performance was 75.31%, achieved by building Calibrated Label Ranking Ã‚Â– NaÃƒÂ¯ve Bayes model, and representing document by multiplying wordÃ‚Â’s term frequency with wordÃ‚Â’s vector average. This configuration improved multilabel classification performance by 4.25%, compared to the previous result. The distributed semantic model that contributed to achieve best performance was 300 dimension word2vec that was trained using WikipediaÃ‚Â’s articles. Moreover, multilabel classification model is also influenced by newsÃ‚Â’ release date. If the train data and the test data were collected from different time range, it would decrease modelÃ‚Â’s performance. This could be seen in this experimentÃ‚Â’s results, as the modelÃ‚Â’s performance was decreased when 5635 data from latest timestamp were added as train data.<p align="justify"> <br />
format	Theses
author	CLAIRINE IRSAN - NIM: 23516081 , IVANA
spellingShingle	CLAIRINE IRSAN - NIM: 23516081 , IVANA HIERARCHICAL MULTILABEL CLASSIFICATION USING DISTRIBUTED SEMANTIC MODEL BASED FEATURES FOR INDONESIAN NEWS ARTICLE
author_facet	CLAIRINE IRSAN - NIM: 23516081 , IVANA
author_sort	CLAIRINE IRSAN - NIM: 23516081 , IVANA
title	HIERARCHICAL MULTILABEL CLASSIFICATION USING DISTRIBUTED SEMANTIC MODEL BASED FEATURES FOR INDONESIAN NEWS ARTICLE
title_short	HIERARCHICAL MULTILABEL CLASSIFICATION USING DISTRIBUTED SEMANTIC MODEL BASED FEATURES FOR INDONESIAN NEWS ARTICLE
title_full	HIERARCHICAL MULTILABEL CLASSIFICATION USING DISTRIBUTED SEMANTIC MODEL BASED FEATURES FOR INDONESIAN NEWS ARTICLE
title_fullStr	HIERARCHICAL MULTILABEL CLASSIFICATION USING DISTRIBUTED SEMANTIC MODEL BASED FEATURES FOR INDONESIAN NEWS ARTICLE
title_full_unstemmed	HIERARCHICAL MULTILABEL CLASSIFICATION USING DISTRIBUTED SEMANTIC MODEL BASED FEATURES FOR INDONESIAN NEWS ARTICLE
title_sort	hierarchical multilabel classification using distributed semantic model based features for indonesian news article
url	https://digilib.itb.ac.id/gdl/view/28175
_version_	1822922495374655488

HIERARCHICAL MULTILABEL CLASSIFICATION USING DISTRIBUTED SEMANTIC MODEL BASED FEATURES FOR INDONESIAN NEWS ARTICLE

Similar Items