MULTI-LABEL CLASSIFICATION OF HATE SPEECH AND ABUSIVE LANGUAGE IN INDONESIAN TWITTER

ABSTRACT Multi-label Classification of Hate Speech and Abusive Language in Indonesian Twitter Muhammad Raihan Asyraf Desanto NIM : 13517027 There have been many studies on the detection of hate speech, but the studies conducted vary widely in defining the label. In this study, we will use datase...

Full description

Saved in:

Bibliographic Details
Main Author:	Raihan Asyraf Desanto, Muhammad
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/64105
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

Description
Summary:	ABSTRACT Multi-label Classification of Hate Speech and Abusive Language in Indonesian Twitter Muhammad Raihan Asyraf Desanto NIM : 13517027 There have been many studies on the detection of hate speech, but the studies conducted vary widely in defining the label. In this study, we will use datasets in research (Ibrohim & Budi, 2019) that are multilabel in nature. One of the challenges in multilabel classification is exploiting the correlation between labels. In addition, imbalanced data can also be a problem in influencing the performance of the model on multilabel classification. This research focuses on handling multilabel classification and also handling imbalanced data. In this research, we used the Classifier Chain (CC) and Deep Neural Network (DNN) adaptations. The use of CC adaptation is done by determining the order of labels that can produce the best performance. The DNN architecture used is an architectural adaptation of the CNN-Dense model. The handling of imbalanced data in this research adapts the Multilabel Synthetic Minority Oversampling Technique (MLSMOTE) technique by applying it to the baseline model and the model developed in this research. The test results show that the DNN model developed is statistically worse than the two baseline models. The CC adaptation model developed was statistically better than the first baseline model but had no significant difference with the second baseline model. The order of the labels greatly affects the performance of the CC. The application of MLSMOTE has not been able to handle cases of imbalanced data on datasets that have high inter-label dependencies so that they do not show a significant effect on the model. Keywords : hate speech; multlabel text classification; deep neural network; MLSMOTE; imbalanced data; example-based accuracy

MULTI-LABEL CLASSIFICATION OF HATE SPEECH AND ABUSIVE LANGUAGE IN INDONESIAN TWITTER

Similar Items