HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE

<p align="justify"> Hate speech is an utterence that has the potential to make a person hate someone or a group of people by harming the person. Hate speech has been happening in many countries including in Indonesia and has a negative impact on those who listen. Manual checking is d...

Full description

Saved in:

Bibliographic Details
Main Author:	LEONARDO SUTEJO (NIM: 13514022), TAUFIC
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/31238
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:31238
spelling	id-itb.:312382018-06-26T09:42:07ZHATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE LEONARDO SUTEJO (NIM: 13514022), TAUFIC Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/31238 <p align="justify"> Hate speech is an utterence that has the potential to make a person hate someone or a group of people by harming the person. Hate speech has been happening in many countries including in Indonesia and has a negative impact on those who listen. Manual checking is done by the police so that hate speech does not harm others. Therefore, a hate speech detection system is needed automatically. Machine learning is used for the creation of hate speech detection models. The use of Random Forest, SVM and LSTM as models and features of unigram, bigram and trigram with a combination of these features and word embedding features are used in this final project experiment. In addition, it also used acoustic features of prosody features, MFCC features, INTERSPEECH 2009 features and INTERSPEECH 2010 features. Plus, there is a combination of features that is a combination of unigram features and word embedding features with all the acoustic features. The data used are 1000 hate speech sentences and 1000 sentences instead of hate speech with test data 100 sentence speech utterances and 100 sentences instead of hate speech. Evaluation conducted is to classify test data and see how much F1-score value obtained. The best F1-score is the best modeling result. Based on the experimental results, the best-performing textual model is LSTM model and word embedding feature with F1-score 87,98% while acoustic model is Random Forest model and INTERSPEECH 2010 feature with F1-score 82,5%. The combined model that provides the best performance is the LSTM model and word embedding feature, INTERSPEECH 2009 feature with F1-score 86.98%. <p align="justify"> text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	<p align="justify"> Hate speech is an utterence that has the potential to make a person hate someone or a group of people by harming the person. Hate speech has been happening in many countries including in Indonesia and has a negative impact on those who listen. Manual checking is done by the police so that hate speech does not harm others. Therefore, a hate speech detection system is needed automatically. Machine learning is used for the creation of hate speech detection models. The use of Random Forest, SVM and LSTM as models and features of unigram, bigram and trigram with a combination of these features and word embedding features are used in this final project experiment. In addition, it also used acoustic features of prosody features, MFCC features, INTERSPEECH 2009 features and INTERSPEECH 2010 features. Plus, there is a combination of features that is a combination of unigram features and word embedding features with all the acoustic features. The data used are 1000 hate speech sentences and 1000 sentences instead of hate speech with test data 100 sentence speech utterances and 100 sentences instead of hate speech. Evaluation conducted is to classify test data and see how much F1-score value obtained. The best F1-score is the best modeling result. Based on the experimental results, the best-performing textual model is LSTM model and word embedding feature with F1-score 87,98% while acoustic model is Random Forest model and INTERSPEECH 2010 feature with F1-score 82,5%. The combined model that provides the best performance is the LSTM model and word embedding feature, INTERSPEECH 2009 feature with F1-score 86.98%. <p align="justify">
format	Final Project
author	LEONARDO SUTEJO (NIM: 13514022), TAUFIC
spellingShingle	LEONARDO SUTEJO (NIM: 13514022), TAUFIC HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
author_facet	LEONARDO SUTEJO (NIM: 13514022), TAUFIC
author_sort	LEONARDO SUTEJO (NIM: 13514022), TAUFIC
title	HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
title_short	HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
title_full	HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
title_fullStr	HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
title_full_unstemmed	HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE
title_sort	hate speech detection using machine learning in indonesian language
url	https://digilib.itb.ac.id/gdl/view/31238
_version_	1822923523680632832

HATE SPEECH DETECTION USING MACHINE LEARNING IN INDONESIAN LANGUAGE

Similar Items