PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT)

The increasing amount of social media content has relevance to the increasing prevalence of hate speech. This presents a challenge in the form of the complexity of distinguishing between recognized freedom of speech and expressions that encourage hatred. Thus, an accurate hate speech content iden...

Full description

Saved in:
Bibliographic Details
Main Author: Donny Ericson, Muhammad
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/80974
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:80974
spelling id-itb.:809742024-03-17T04:39:42ZPERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT) Donny Ericson, Muhammad Indonesia Theses Preprocessing, Contextual Preprocessing, Hate Speech, BERT, Fine-Tuning. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/80974 The increasing amount of social media content has relevance to the increasing prevalence of hate speech. This presents a challenge in the form of the complexity of distinguishing between recognized freedom of speech and expressions that encourage hatred. Thus, an accurate hate speech content identification system is required. However, the presence of bias in the system development process leads to inaccuracies in identifying content that should be considered as hate speech. To overcome this, it is important to set clear standards for the criteria of hate speech, thus reducing the risk of bias in the detection process. This research aims to formulate hate speech criteria based on the concepts of speech, hatred, and hate speech itself. In the initial stage, the criteria are translated into linguistic context and then implemented in a programming algorithm in the natural language processing pre-process that aims to provide automatic labeling based on the formulated criteria. The next stage involves training using a BERT-based pre-training language model approach and fine-tuning to adapt the model to domains relevant to hate speech. The evaluation was conducted by looking at the accuracy, precision, recall, and f1-score of the developed model, while analyzing the bias reduction that the model might produce. This research produced a model with superior accuracy, precision, and recall compared to previous research. The success of this model is due to the establishment of more assertive and linguistically specific hate speech criteria, allowing the model to more precisely identify hateful content and significantly improve detection performance. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description The increasing amount of social media content has relevance to the increasing prevalence of hate speech. This presents a challenge in the form of the complexity of distinguishing between recognized freedom of speech and expressions that encourage hatred. Thus, an accurate hate speech content identification system is required. However, the presence of bias in the system development process leads to inaccuracies in identifying content that should be considered as hate speech. To overcome this, it is important to set clear standards for the criteria of hate speech, thus reducing the risk of bias in the detection process. This research aims to formulate hate speech criteria based on the concepts of speech, hatred, and hate speech itself. In the initial stage, the criteria are translated into linguistic context and then implemented in a programming algorithm in the natural language processing pre-process that aims to provide automatic labeling based on the formulated criteria. The next stage involves training using a BERT-based pre-training language model approach and fine-tuning to adapt the model to domains relevant to hate speech. The evaluation was conducted by looking at the accuracy, precision, recall, and f1-score of the developed model, while analyzing the bias reduction that the model might produce. This research produced a model with superior accuracy, precision, and recall compared to previous research. The success of this model is due to the establishment of more assertive and linguistically specific hate speech criteria, allowing the model to more precisely identify hateful content and significantly improve detection performance.
format Theses
author Donny Ericson, Muhammad
spellingShingle Donny Ericson, Muhammad
PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT)
author_facet Donny Ericson, Muhammad
author_sort Donny Ericson, Muhammad
title PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT)
title_short PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT)
title_full PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT)
title_fullStr PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT)
title_full_unstemmed PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT)
title_sort performance improvement of hate speech detection for hateful statements using contextual preprocessing and fine-tuning strategy on pre-trained language model (bert)
url https://digilib.itb.ac.id/gdl/view/80974
_version_ 1822009339779481600