HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
<p align="justify"> Hate speech is an utterence that has the potential to make a person hate someone or a group of people by harming the person. Hate speech has been happening in many countries including in Indonesia and has a negative impact on those who listen. Manual checking is d...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/31238 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:31238 |
---|---|
spelling |
id-itb.:312382018-06-26T09:42:07ZHATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE LEONARDO SUTEJO (NIM: 13514022), TAUFIC Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/31238 <p align="justify"> Hate speech is an utterence that has the potential to make a person hate someone or a group of people by harming the person. Hate speech has been happening in many countries including in Indonesia and has a negative impact on those who listen. Manual checking is done by the police so that hate speech does not harm others. Therefore, a hate speech detection system is needed automatically. Machine learning is used for the creation of hate speech detection models. The use of Random Forest, SVM and LSTM as models and features of unigram, bigram and trigram with a combination of these features and word embedding features are used in this final project experiment. In addition, it also used acoustic features of prosody features, MFCC features, INTERSPEECH 2009 features and INTERSPEECH 2010 features. Plus, there is a combination of features that is a combination of unigram features and word embedding features with all the acoustic features. The data used are 1000 hate speech sentences and 1000 sentences instead of hate speech with test data 100 sentence speech utterances and 100 sentences instead of hate speech. Evaluation conducted is to classify test data and see how much F1-score value obtained. The best F1-score is the best modeling result. Based on the experimental results, the best-performing textual model is LSTM model and word embedding feature with F1-score 87,98% while acoustic model is Random Forest model and INTERSPEECH 2010 feature with F1-score 82,5%. The combined model that provides the best performance is the LSTM model and word embedding feature, INTERSPEECH 2009 feature with F1-score 86.98%. <p align="justify"> text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
<p align="justify"> Hate speech is an utterence that has the potential to make a person hate someone or a group of people by harming the person. Hate speech has been happening in many countries including in Indonesia and has a negative impact on those who listen. Manual checking is done by the police so that hate speech does not harm others. Therefore, a hate speech detection system is needed automatically. Machine learning is used for the creation of hate speech detection models. The use of Random Forest, SVM and LSTM as models and features of unigram, bigram and trigram with a combination of these features and word embedding features are used in this final project experiment. In addition, it also used acoustic features of prosody features, MFCC features, INTERSPEECH 2009 features and INTERSPEECH 2010 features. Plus, there is a combination of features that is a combination of unigram features and word embedding features with all the acoustic features. The data used are 1000 hate speech sentences and 1000 sentences instead of hate speech with test data 100 sentence speech utterances and 100 sentences instead of hate speech. Evaluation conducted is to classify test data and see how much F1-score value obtained. The best F1-score is the best modeling result. Based on the experimental results, the best-performing textual model is LSTM model and word embedding feature with F1-score 87,98% while acoustic model is Random Forest model and INTERSPEECH 2010 feature with F1-score 82,5%. The combined model that provides the best performance is the LSTM model and word embedding feature, INTERSPEECH 2009 feature with F1-score 86.98%. <p align="justify"> |
format |
Final Project |
author |
LEONARDO SUTEJO (NIM: 13514022), TAUFIC |
spellingShingle |
LEONARDO SUTEJO (NIM: 13514022), TAUFIC HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE |
author_facet |
LEONARDO SUTEJO (NIM: 13514022), TAUFIC |
author_sort |
LEONARDO SUTEJO (NIM: 13514022), TAUFIC |
title |
HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE |
title_short |
HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE |
title_full |
HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE |
title_fullStr |
HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE |
title_full_unstemmed |
HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE |
title_sort |
hate speech detection using machine learning in indonesian language |
url |
https://digilib.itb.ac.id/gdl/view/31238 |
_version_ |
1822923523680632832 |