Deep learning techniques for hate speech detection

Considering the prevalence of hate speech in social media platforms, automatic hate speech detection is a crucial tool in the fight against hate speech proliferation. Several techniques, such as the recent surge in deep learning-based methods, have been developed for the task. Different datasets tha...

Full description

Saved in:
Bibliographic Details
Main Author: Sam, Jared Mun Kit
Other Authors: Luu Anh Tuan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172646
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-172646
record_format dspace
spelling sg-ntu-dr.10356-1726462023-12-22T15:38:13Z Deep learning techniques for hate speech detection Sam, Jared Mun Kit Luu Anh Tuan School of Computer Science and Engineering anhtuan.luu@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Document and text processing Considering the prevalence of hate speech in social media platforms, automatic hate speech detection is a crucial tool in the fight against hate speech proliferation. Several techniques, such as the recent surge in deep learning-based methods, have been developed for the task. Different datasets that represent different facets of the hate speech detection issue have also been created. Using three prominent public datasets, a comprehensive empirical analysis of hate speech detection techniques is presented in this study. The implementation and comparison of current models offered pivotal insights into machine learning models’ efficacy, word representation models, and their performance variance across different datasets. Convolutional Neural Networks (CNN) emerged as a consistent performer, especially when coupled with Bidirectional Encoder Representations from Transformers (BERT) embeddings. The performance of Multi-Layer Perceptron (MLP) was notably affected by the chosen word representation method, with the BERT combination being superior. Word representation evaluation underscored BERT’s superior capability, attributable to its pre-training on extensive corpora and its provision of contextual word representations, outclassing fixed embeddings like Global Vectors for Word Representation (Glove) and Term-Frequency-Inverse Document Frequency (TF-IDF). Despite BERT’s strengths, its low macro average scores highlight the challenges in accurately identifying minority hateful tweets amidst vast tweet volumes. Bachelor of Engineering (Computer Science) 2023-12-19T11:18:52Z 2023-12-19T11:18:52Z 2023 Final Year Project (FYP) Sam, J. M. K. (2023). Deep learning techniques for hate speech detection. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/172646 https://hdl.handle.net/10356/172646 en SCSE22-1111 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Sam, Jared Mun Kit
Deep learning techniques for hate speech detection
description Considering the prevalence of hate speech in social media platforms, automatic hate speech detection is a crucial tool in the fight against hate speech proliferation. Several techniques, such as the recent surge in deep learning-based methods, have been developed for the task. Different datasets that represent different facets of the hate speech detection issue have also been created. Using three prominent public datasets, a comprehensive empirical analysis of hate speech detection techniques is presented in this study. The implementation and comparison of current models offered pivotal insights into machine learning models’ efficacy, word representation models, and their performance variance across different datasets. Convolutional Neural Networks (CNN) emerged as a consistent performer, especially when coupled with Bidirectional Encoder Representations from Transformers (BERT) embeddings. The performance of Multi-Layer Perceptron (MLP) was notably affected by the chosen word representation method, with the BERT combination being superior. Word representation evaluation underscored BERT’s superior capability, attributable to its pre-training on extensive corpora and its provision of contextual word representations, outclassing fixed embeddings like Global Vectors for Word Representation (Glove) and Term-Frequency-Inverse Document Frequency (TF-IDF). Despite BERT’s strengths, its low macro average scores highlight the challenges in accurately identifying minority hateful tweets amidst vast tweet volumes.
author2 Luu Anh Tuan
author_facet Luu Anh Tuan
Sam, Jared Mun Kit
format Final Year Project
author Sam, Jared Mun Kit
author_sort Sam, Jared Mun Kit
title Deep learning techniques for hate speech detection
title_short Deep learning techniques for hate speech detection
title_full Deep learning techniques for hate speech detection
title_fullStr Deep learning techniques for hate speech detection
title_full_unstemmed Deep learning techniques for hate speech detection
title_sort deep learning techniques for hate speech detection
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/172646
_version_ 1787136653325762560