MULTI-LABEL CLASSIFICATION OF HATE SPEECH AND ABUSIVE LANGUAGE IN INDONESIAN TWITTER

ABSTRACT Multi-label Classification of Hate Speech and Abusive Language in Indonesian Twitter Muhammad Raihan Asyraf Desanto NIM : 13517027 There have been many studies on the detection of hate speech, but the studies conducted vary widely in defining the label. In this study, we will use datase...

Full description

Saved in:
Bibliographic Details
Main Author: Raihan Asyraf Desanto, Muhammad
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/64105
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:64105
spelling id-itb.:641052022-03-29T09:28:57ZMULTI-LABEL CLASSIFICATION OF HATE SPEECH AND ABUSIVE LANGUAGE IN INDONESIAN TWITTER Raihan Asyraf Desanto, Muhammad Indonesia Final Project hate speech; multlabel text classification; deep neural network; MLSMOTE; imbalanced data; example-based accuracy INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/64105 ABSTRACT Multi-label Classification of Hate Speech and Abusive Language in Indonesian Twitter Muhammad Raihan Asyraf Desanto NIM : 13517027 There have been many studies on the detection of hate speech, but the studies conducted vary widely in defining the label. In this study, we will use datasets in research (Ibrohim & Budi, 2019) that are multilabel in nature. One of the challenges in multilabel classification is exploiting the correlation between labels. In addition, imbalanced data can also be a problem in influencing the performance of the model on multilabel classification. This research focuses on handling multilabel classification and also handling imbalanced data. In this research, we used the Classifier Chain (CC) and Deep Neural Network (DNN) adaptations. The use of CC adaptation is done by determining the order of labels that can produce the best performance. The DNN architecture used is an architectural adaptation of the CNN-Dense model. The handling of imbalanced data in this research adapts the Multilabel Synthetic Minority Oversampling Technique (MLSMOTE) technique by applying it to the baseline model and the model developed in this research. The test results show that the DNN model developed is statistically worse than the two baseline models. The CC adaptation model developed was statistically better than the first baseline model but had no significant difference with the second baseline model. The order of the labels greatly affects the performance of the CC. The application of MLSMOTE has not been able to handle cases of imbalanced data on datasets that have high inter-label dependencies so that they do not show a significant effect on the model. Keywords : hate speech; multlabel text classification; deep neural network; MLSMOTE; imbalanced data; example-based accuracy text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description ABSTRACT Multi-label Classification of Hate Speech and Abusive Language in Indonesian Twitter Muhammad Raihan Asyraf Desanto NIM : 13517027 There have been many studies on the detection of hate speech, but the studies conducted vary widely in defining the label. In this study, we will use datasets in research (Ibrohim & Budi, 2019) that are multilabel in nature. One of the challenges in multilabel classification is exploiting the correlation between labels. In addition, imbalanced data can also be a problem in influencing the performance of the model on multilabel classification. This research focuses on handling multilabel classification and also handling imbalanced data. In this research, we used the Classifier Chain (CC) and Deep Neural Network (DNN) adaptations. The use of CC adaptation is done by determining the order of labels that can produce the best performance. The DNN architecture used is an architectural adaptation of the CNN-Dense model. The handling of imbalanced data in this research adapts the Multilabel Synthetic Minority Oversampling Technique (MLSMOTE) technique by applying it to the baseline model and the model developed in this research. The test results show that the DNN model developed is statistically worse than the two baseline models. The CC adaptation model developed was statistically better than the first baseline model but had no significant difference with the second baseline model. The order of the labels greatly affects the performance of the CC. The application of MLSMOTE has not been able to handle cases of imbalanced data on datasets that have high inter-label dependencies so that they do not show a significant effect on the model. Keywords : hate speech; multlabel text classification; deep neural network; MLSMOTE; imbalanced data; example-based accuracy
format Final Project
author Raihan Asyraf Desanto, Muhammad
spellingShingle Raihan Asyraf Desanto, Muhammad
MULTI-LABEL CLASSIFICATION OF HATE SPEECH AND ABUSIVE LANGUAGE IN INDONESIAN TWITTER
author_facet Raihan Asyraf Desanto, Muhammad
author_sort Raihan Asyraf Desanto, Muhammad
title MULTI-LABEL CLASSIFICATION OF HATE SPEECH AND ABUSIVE LANGUAGE IN INDONESIAN TWITTER
title_short MULTI-LABEL CLASSIFICATION OF HATE SPEECH AND ABUSIVE LANGUAGE IN INDONESIAN TWITTER
title_full MULTI-LABEL CLASSIFICATION OF HATE SPEECH AND ABUSIVE LANGUAGE IN INDONESIAN TWITTER
title_fullStr MULTI-LABEL CLASSIFICATION OF HATE SPEECH AND ABUSIVE LANGUAGE IN INDONESIAN TWITTER
title_full_unstemmed MULTI-LABEL CLASSIFICATION OF HATE SPEECH AND ABUSIVE LANGUAGE IN INDONESIAN TWITTER
title_sort multi-label classification of hate speech and abusive language in indonesian twitter
url https://digilib.itb.ac.id/gdl/view/64105
_version_ 1822932344077549568