MULTI-LABEL CLASSIFICATION OF HATE SPEECH AND ABUSIVE LANGUAGE IN INDONESIAN TWITTER
ABSTRACT Multi-label Classification of Hate Speech and Abusive Language in Indonesian Twitter Muhammad Raihan Asyraf Desanto NIM : 13517027 There have been many studies on the detection of hate speech, but the studies conducted vary widely in defining the label. In this study, we will use datase...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/64105 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | ABSTRACT
Multi-label Classification of Hate Speech and Abusive Language in Indonesian Twitter
Muhammad Raihan Asyraf Desanto
NIM : 13517027
There have been many studies on the detection of hate speech, but the studies conducted vary widely in defining the label. In this study, we will use datasets in research (Ibrohim & Budi, 2019) that are multilabel in nature. One of the challenges in multilabel classification is exploiting the correlation between labels. In addition, imbalanced data can also be a problem in influencing the performance of the model on multilabel classification. This research focuses on handling multilabel classification and also handling imbalanced data.
In this research, we used the Classifier Chain (CC) and Deep Neural Network (DNN) adaptations. The use of CC adaptation is done by determining the order of labels that can produce the best performance. The DNN architecture used is an architectural adaptation of the CNN-Dense model. The handling of imbalanced data in this research adapts the Multilabel Synthetic Minority Oversampling Technique (MLSMOTE) technique by applying it to the baseline model and the model developed in this research.
The test results show that the DNN model developed is statistically worse than the two baseline models. The CC adaptation model developed was statistically better than the first baseline model but had no significant difference with the second baseline model. The order of the labels greatly affects the performance of the CC. The application of MLSMOTE has not been able to handle cases of imbalanced data on datasets that have high inter-label dependencies so that they do not show a significant effect on the model.
Keywords : hate speech; multlabel text classification; deep neural network; MLSMOTE; imbalanced data; example-based accuracy |
---|