PERFORMANCE IMPROVEMENT OF HATE SPEECH DETECTION FOR HATEFUL STATEMENTS USING CONTEXTUAL PREPROCESSING AND FINE-TUNING STRATEGY ON PRE-TRAINED LANGUAGE MODEL (BERT)

The increasing amount of social media content has relevance to the increasing prevalence of hate speech. This presents a challenge in the form of the complexity of distinguishing between recognized freedom of speech and expressions that encourage hatred. Thus, an accurate hate speech content iden...

Full description

Saved in:
Bibliographic Details
Main Author: Donny Ericson, Muhammad
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/80974
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:The increasing amount of social media content has relevance to the increasing prevalence of hate speech. This presents a challenge in the form of the complexity of distinguishing between recognized freedom of speech and expressions that encourage hatred. Thus, an accurate hate speech content identification system is required. However, the presence of bias in the system development process leads to inaccuracies in identifying content that should be considered as hate speech. To overcome this, it is important to set clear standards for the criteria of hate speech, thus reducing the risk of bias in the detection process. This research aims to formulate hate speech criteria based on the concepts of speech, hatred, and hate speech itself. In the initial stage, the criteria are translated into linguistic context and then implemented in a programming algorithm in the natural language processing pre-process that aims to provide automatic labeling based on the formulated criteria. The next stage involves training using a BERT-based pre-training language model approach and fine-tuning to adapt the model to domains relevant to hate speech. The evaluation was conducted by looking at the accuracy, precision, recall, and f1-score of the developed model, while analyzing the bias reduction that the model might produce. This research produced a model with superior accuracy, precision, and recall compared to previous research. The success of this model is due to the establishment of more assertive and linguistically specific hate speech criteria, allowing the model to more precisely identify hateful content and significantly improve detection performance.