Deep learning techniques for hate speech detection
Considering the prevalence of hate speech in social media platforms, automatic hate speech detection is a crucial tool in the fight against hate speech proliferation. Several techniques, such as the recent surge in deep learning-based methods, have been developed for the task. Different datasets tha...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172646 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-172646 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1726462023-12-22T15:38:13Z Deep learning techniques for hate speech detection Sam, Jared Mun Kit Luu Anh Tuan School of Computer Science and Engineering anhtuan.luu@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Document and text processing Considering the prevalence of hate speech in social media platforms, automatic hate speech detection is a crucial tool in the fight against hate speech proliferation. Several techniques, such as the recent surge in deep learning-based methods, have been developed for the task. Different datasets that represent different facets of the hate speech detection issue have also been created. Using three prominent public datasets, a comprehensive empirical analysis of hate speech detection techniques is presented in this study. The implementation and comparison of current models offered pivotal insights into machine learning models’ efficacy, word representation models, and their performance variance across different datasets. Convolutional Neural Networks (CNN) emerged as a consistent performer, especially when coupled with Bidirectional Encoder Representations from Transformers (BERT) embeddings. The performance of Multi-Layer Perceptron (MLP) was notably affected by the chosen word representation method, with the BERT combination being superior. Word representation evaluation underscored BERT’s superior capability, attributable to its pre-training on extensive corpora and its provision of contextual word representations, outclassing fixed embeddings like Global Vectors for Word Representation (Glove) and Term-Frequency-Inverse Document Frequency (TF-IDF). Despite BERT’s strengths, its low macro average scores highlight the challenges in accurately identifying minority hateful tweets amidst vast tweet volumes. Bachelor of Engineering (Computer Science) 2023-12-19T11:18:52Z 2023-12-19T11:18:52Z 2023 Final Year Project (FYP) Sam, J. M. K. (2023). Deep learning techniques for hate speech detection. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/172646 https://hdl.handle.net/10356/172646 en SCSE22-1111 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Document and text processing |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Document and text processing Sam, Jared Mun Kit Deep learning techniques for hate speech detection |
description |
Considering the prevalence of hate speech in social media platforms, automatic hate speech detection is a crucial tool in the fight against hate speech proliferation. Several techniques, such as the recent surge in deep learning-based methods, have been developed for the task. Different datasets that represent different facets of the hate speech detection issue have also been created. Using three prominent public datasets, a comprehensive empirical analysis of hate speech detection techniques is presented in this study. The implementation and comparison of current models offered pivotal insights into machine learning models’ efficacy, word representation models, and their performance variance across different datasets. Convolutional Neural Networks (CNN) emerged as a consistent performer, especially when coupled with Bidirectional Encoder Representations from Transformers (BERT) embeddings. The performance of Multi-Layer Perceptron (MLP) was notably affected by the chosen word representation method, with the BERT combination being superior. Word representation evaluation underscored BERT’s superior capability, attributable to its pre-training on extensive corpora and its provision of contextual word representations, outclassing fixed embeddings like Global Vectors for Word Representation (Glove) and Term-Frequency-Inverse Document Frequency (TF-IDF). Despite BERT’s strengths, its low macro average scores highlight the challenges in accurately identifying minority hateful tweets amidst vast tweet volumes. |
author2 |
Luu Anh Tuan |
author_facet |
Luu Anh Tuan Sam, Jared Mun Kit |
format |
Final Year Project |
author |
Sam, Jared Mun Kit |
author_sort |
Sam, Jared Mun Kit |
title |
Deep learning techniques for hate speech detection |
title_short |
Deep learning techniques for hate speech detection |
title_full |
Deep learning techniques for hate speech detection |
title_fullStr |
Deep learning techniques for hate speech detection |
title_full_unstemmed |
Deep learning techniques for hate speech detection |
title_sort |
deep learning techniques for hate speech detection |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/172646 |
_version_ |
1787136653325762560 |