AUTOMATIC MULTI-LABEL CLASSIFICATION OF ECONOMIC PHENOMENON NEWS FOR SUPPORTING THE FORMULATION OF GROSS DOMESTIC PRODUCT (GDP)

Gross Domestic Product (GDP) is a macro indicator to determine the economic development of a region. GDP is compiled by the Statistics of Indonesia (BPS) using the SNN (System National Account) methodology. In its calculation process, BPS uses news analysis of economic phenomena to control and im...

Full description

Saved in:
Bibliographic Details
Main Author: Junardi, Wira
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/52275
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Gross Domestic Product (GDP) is a macro indicator to determine the economic development of a region. GDP is compiled by the Statistics of Indonesia (BPS) using the SNN (System National Account) methodology. In its calculation process, BPS uses news analysis of economic phenomena to control and improve the quality of GDP figures. To obtain data on this economic phenomenon, several sources are used like online news and printed media. Currently, the process of analyzing economic phenomena is still done manually by the BPS team so it requires quite a long time. Another problem arises when an economic article involves more than one industrial category. Several economic phenomenon data that can be categorized as more than one sector indicates that the problem of classification of economic phenomenon news requires a multi-label classification approach. This study uses the Problem Transformation method to classify multi-label news of economic phenomena automatically for types of GDP according to the Industrial Field which consists of 17 categories and 13 sub-categories. The method that used in the development of multi-label classification models are Binary Relevance, Classifier Chain, and Label Powerset. As for the measurement evaluation of the model using the Example-Based Measure method. The corpus of data prepared for training and testing are 1000 and 100 articles respectively. This research also involved four feature vectorization techniques, namely TF-IDF, word2vec, fastText, and doc2vec. The best model is obtained through a combination method of the TFIDF, LP, and Linear SVC with a micro f1-score 75% for validation and escalate to 82% at testing scenario. The output of this research would be a tool to accelerate the process of classification of news of economic phenomena thus the analysis process becomes more efficient.