A MULTILABEL CLASSIFICATION USING PROBLEM TRANSFORMATION APPROACH AND MACHINE LEARNING FOR MULTIPLE EVENT DETECTION
<p align="justify">Social media is a source that stores a lot of valuable information. One of which can be used to know the event that occurs in urban areas that are shared by the urban society. Information shared such as congestion and floods, can be used as decision-making material...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/27620 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | <p align="justify">Social media is a source that stores a lot of valuable information. One of which can be used to know the event that occurs in urban areas that are shared by the urban society. Information shared such as congestion and floods, can be used as decision-making materials in city management. One of the Social media that is popularly used is twitter. A Tweets sometimes contains more than one type of event information at the same time. That means the tweets are associated with more than one label or called multi-label classification. The purpose of the research is to design a system architecture and find the best model for event detection from user’s tweets into multilabel classification using the problem transformation approach method and machine learning algorithm (ML). <br />
<br />
<br />
<br />
Two methods of problem transformation approach are binary relevance (BR) and label powerset (LP).The outline, this study is divided into four main parts are data collection, data labelling, data processing, and data classification. In this research, events detected are the natural event were related to traffic and natural disasters. The results of the experiment show that the proposed system architecture has successfully detected events classified into a single label and multilabel classification. The best model for multiple events detection is obtained by a combination of LP, random forest algorithm, and gini index feature selection. <br />
<br />
<br />
The results of multi-label classification experiments show the accuracy of 87.0%, F-score 89.1%, and hamming loss 9.2%. In this research, we also calculated out of vocabulary (OOV), where the number of OOV tokens not found in the training data reached 79.96%. However, the difference in classifier performance values in vocabulary and OOV is very small, accuracy 45.5%, F-score 5.2% and hamming loss 3.2%. <p align="justify"> |
---|