AUTOMATIC MULTI-LABEL CLASSIFICATION OF ECONOMIC PHENOMENON NEWS FOR SUPPORTING THE FORMULATION OF GROSS DOMESTIC PRODUCT (GDP)
Gross Domestic Product (GDP) is a macro indicator to determine the economic development of a region. GDP is compiled by the Statistics of Indonesia (BPS) using the SNN (System National Account) methodology. In its calculation process, BPS uses news analysis of economic phenomena to control and im...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/52275 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Gross Domestic Product (GDP) is a macro indicator to determine the economic
development of a region. GDP is compiled by the Statistics of Indonesia (BPS)
using the SNN (System National Account) methodology. In its calculation process,
BPS uses news analysis of economic phenomena to control and improve the quality
of GDP figures. To obtain data on this economic phenomenon, several sources are
used like online news and printed media. Currently, the process of analyzing
economic phenomena is still done manually by the BPS team so it requires quite a
long time. Another problem arises when an economic article involves more than
one industrial category. Several economic phenomenon data that can be
categorized as more than one sector indicates that the problem of classification of
economic phenomenon news requires a multi-label classification approach.
This study uses the Problem Transformation method to classify multi-label news of
economic phenomena automatically for types of GDP according to the Industrial
Field which consists of 17 categories and 13 sub-categories. The method that used
in the development of multi-label classification models are Binary Relevance,
Classifier Chain, and Label Powerset. As for the measurement evaluation of the
model using the Example-Based Measure method. The corpus of data prepared for
training and testing are 1000 and 100 articles respectively. This research also
involved four feature vectorization techniques, namely TF-IDF, word2vec, fastText,
and doc2vec. The best model is obtained through a combination method of the TFIDF, LP, and Linear SVC with a micro f1-score 75% for validation and escalate to
82% at testing scenario. The output of this research would be a tool to accelerate
the process of classification of news of economic phenomena thus the analysis
process becomes more efficient.
|
---|