PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT)
The increasing amount of social media content has relevance to the increasing prevalence of hate speech. This presents a challenge in the form of the complexity of distinguishing between recognized freedom of speech and expressions that encourage hatred. Thus, an accurate hate speech content iden...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/80974 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:80974 |
---|---|
spelling |
id-itb.:809742024-03-17T04:39:42ZPERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT) Donny Ericson, Muhammad Indonesia Theses Preprocessing, Contextual Preprocessing, Hate Speech, BERT, Fine-Tuning. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/80974 The increasing amount of social media content has relevance to the increasing prevalence of hate speech. This presents a challenge in the form of the complexity of distinguishing between recognized freedom of speech and expressions that encourage hatred. Thus, an accurate hate speech content identification system is required. However, the presence of bias in the system development process leads to inaccuracies in identifying content that should be considered as hate speech. To overcome this, it is important to set clear standards for the criteria of hate speech, thus reducing the risk of bias in the detection process. This research aims to formulate hate speech criteria based on the concepts of speech, hatred, and hate speech itself. In the initial stage, the criteria are translated into linguistic context and then implemented in a programming algorithm in the natural language processing pre-process that aims to provide automatic labeling based on the formulated criteria. The next stage involves training using a BERT-based pre-training language model approach and fine-tuning to adapt the model to domains relevant to hate speech. The evaluation was conducted by looking at the accuracy, precision, recall, and f1-score of the developed model, while analyzing the bias reduction that the model might produce. This research produced a model with superior accuracy, precision, and recall compared to previous research. The success of this model is due to the establishment of more assertive and linguistically specific hate speech criteria, allowing the model to more precisely identify hateful content and significantly improve detection performance. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
The increasing amount of social media content has relevance to the increasing prevalence of hate
speech. This presents a challenge in the form of the complexity of distinguishing between
recognized freedom of speech and expressions that encourage hatred. Thus, an accurate hate
speech content identification system is required. However, the presence of bias in the system
development process leads to inaccuracies in identifying content that should be considered as hate
speech. To overcome this, it is important to set clear standards for the criteria of hate speech, thus
reducing the risk of bias in the detection process. This research aims to formulate hate speech
criteria based on the concepts of speech, hatred, and hate speech itself. In the initial stage, the
criteria are translated into linguistic context and then implemented in a programming algorithm
in the natural language processing pre-process that aims to provide automatic labeling based on
the formulated criteria. The next stage involves training using a BERT-based pre-training
language model approach and fine-tuning to adapt the model to domains relevant to hate speech.
The evaluation was conducted by looking at the accuracy, precision, recall, and f1-score of the
developed model, while analyzing the bias reduction that the model might produce. This research
produced a model with superior accuracy, precision, and recall compared to previous research.
The success of this model is due to the establishment of more assertive and linguistically specific
hate speech criteria, allowing the model to more precisely identify hateful content and significantly
improve detection performance. |
format |
Theses |
author |
Donny Ericson, Muhammad |
spellingShingle |
Donny Ericson, Muhammad PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT) |
author_facet |
Donny Ericson, Muhammad |
author_sort |
Donny Ericson, Muhammad |
title |
PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT) |
title_short |
PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT) |
title_full |
PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT) |
title_fullStr |
PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT) |
title_full_unstemmed |
PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT) |
title_sort |
performance improvement of hate speech detection for hateful statements using contextual preprocessing and fine-tuning strategy on pre-trained language model (bert) |
url |
https://digilib.itb.ac.id/gdl/view/80974 |
_version_ |
1822009339779481600 |