Deep learning techniques for hate speech detection

In recent years, hate speech has grown significantly on social media, this has become a major issue, that need to be tackled urgently. One countermeasure involves the use of artificial intelligence to promptly remove hate speech before it can spread and get viral. Deep learning, a subset of artifici...

Full description

Saved in:
Bibliographic Details
Main Author: Lee, Yuan Cheng
Other Authors: Luu Anh Tuan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171929
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In recent years, hate speech has grown significantly on social media, this has become a major issue, that need to be tackled urgently. One countermeasure involves the use of artificial intelligence to promptly remove hate speech before it can spread and get viral. Deep learning, a subset of artificial intelligence is the state-of-the-art technology for addressing Natural Language Processing (NLP) tasks that have shown promising results. However, finding the optimal model that is best suited for hate speech detection is a challenge for many. In this paper, deep learning pipelines are examined and discussed to give a more comprehensive understanding of their application in hate speech detection. From datasets used, feature engineering techniques, deep learning architectures, the training process, and the evaluation of the models. The datasets used are freely available on the internet, including sources like Gab Hate Corpus, Implicit Hate Corpus and SE2019. Feature engineering technique specifically word embedding methods such as Word2Vec, FastText and GloVe. Deep learning architectures such as Convolutional Recurrent Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional Encoder Representations from Transformer (BERT), lastly Generative Pre-trained Transformer (GPT). The contributions of this study will serve to provide the research community a comprehensive understanding of the deep learning pipelines for hate speech detection. The results will offer insight into the various datasets, word embeddings and deep learning models effectiveness. This in turn, can serve as a guiding resource for future researchers to select the most suitable models for hate speech detection.