HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE

<p align="justify"> Hate speech is an utterence that has the potential to make a person hate someone or a group of people by harming the person. Hate speech has been happening in many countries including in Indonesia and has a negative impact on those who listen. Manual checking is d...

Full description

Saved in:
Bibliographic Details
Main Author: LEONARDO SUTEJO (NIM: 13514022), TAUFIC
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/31238
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:31238
spelling id-itb.:312382018-06-26T09:42:07ZHATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE LEONARDO SUTEJO (NIM: 13514022), TAUFIC Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/31238 <p align="justify"> Hate speech is an utterence that has the potential to make a person hate someone or a group of people by harming the person. Hate speech has been happening in many countries including in Indonesia and has a negative impact on those who listen. Manual checking is done by the police so that hate speech does not harm others. Therefore, a hate speech detection system is needed automatically. Machine learning is used for the creation of hate speech detection models. The use of Random Forest, SVM and LSTM as models and features of unigram, bigram and trigram with a combination of these features and word embedding features are used in this final project experiment. In addition, it also used acoustic features of prosody features, MFCC features, INTERSPEECH 2009 features and INTERSPEECH 2010 features. Plus, there is a combination of features that is a combination of unigram features and word embedding features with all the acoustic features. The data used are 1000 hate speech sentences and 1000 sentences instead of hate speech with test data 100 sentence speech utterances and 100 sentences instead of hate speech. Evaluation conducted is to classify test data and see how much F1-score value obtained. The best F1-score is the best modeling result. Based on the experimental results, the best-performing textual model is LSTM model and word embedding feature with F1-score 87,98% while acoustic model is Random Forest model and INTERSPEECH 2010 feature with F1-score 82,5%. The combined model that provides the best performance is the LSTM model and word embedding feature, INTERSPEECH 2009 feature with F1-score 86.98%. <p align="justify"> text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description <p align="justify"> Hate speech is an utterence that has the potential to make a person hate someone or a group of people by harming the person. Hate speech has been happening in many countries including in Indonesia and has a negative impact on those who listen. Manual checking is done by the police so that hate speech does not harm others. Therefore, a hate speech detection system is needed automatically. Machine learning is used for the creation of hate speech detection models. The use of Random Forest, SVM and LSTM as models and features of unigram, bigram and trigram with a combination of these features and word embedding features are used in this final project experiment. In addition, it also used acoustic features of prosody features, MFCC features, INTERSPEECH 2009 features and INTERSPEECH 2010 features. Plus, there is a combination of features that is a combination of unigram features and word embedding features with all the acoustic features. The data used are 1000 hate speech sentences and 1000 sentences instead of hate speech with test data 100 sentence speech utterances and 100 sentences instead of hate speech. Evaluation conducted is to classify test data and see how much F1-score value obtained. The best F1-score is the best modeling result. Based on the experimental results, the best-performing textual model is LSTM model and word embedding feature with F1-score 87,98% while acoustic model is Random Forest model and INTERSPEECH 2010 feature with F1-score 82,5%. The combined model that provides the best performance is the LSTM model and word embedding feature, INTERSPEECH 2009 feature with F1-score 86.98%. <p align="justify">
format Final Project
author LEONARDO SUTEJO (NIM: 13514022), TAUFIC
spellingShingle LEONARDO SUTEJO (NIM: 13514022), TAUFIC
HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
author_facet LEONARDO SUTEJO (NIM: 13514022), TAUFIC
author_sort LEONARDO SUTEJO (NIM: 13514022), TAUFIC
title HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
title_short HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
title_full HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
title_fullStr HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
title_full_unstemmed HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
title_sort hate speech detection using machine learning in indonesian language
url https://digilib.itb.ac.id/gdl/view/31238
_version_ 1822923523680632832