Deep learning techniques for hate speech detection

With the rapid growth of the Internet and continuous expansion of online content, the proliferation of hate speech also increases. Hate speech has severe implications on social polarization, as well as physical and mental safety, warranting an urgent need for effective automated detection. In this s...

Full description

Saved in:
Bibliographic Details
Main Author: Chang, Timothy Zu'En
Other Authors: Luu Anh Tuan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175272
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-175272
record_format dspace
spelling sg-ntu-dr.10356-1752722024-04-26T15:44:03Z Deep learning techniques for hate speech detection Chang, Timothy Zu'En Luu Anh Tuan School of Computer Science and Engineering anhtuan.luu@ntu.edu.sg Computer and Information Science With the rapid growth of the Internet and continuous expansion of online content, the proliferation of hate speech also increases. Hate speech has severe implications on social polarization, as well as physical and mental safety, warranting an urgent need for effective automated detection. In this study, we analyze current efforts by the scientific community in developing automated methods for detecting online hate speech. This led us to the discovery of machine learning-based approaches for automatic hate speech detection, in particular deep learning approaches which were popularized for their robustness and ability to learn newly evolving slang. In doing so, we also investigate the challenges faced, including language nuances, varying definitions of hate speech, and data constraints. This study aims to answer the following questions: How do we define and distinguish hate speech from other classes of speech? What is currently being done in the scientific community for its detection? Lastly, how effective are they in classifying hate speech? In this study, we conducted literary research on the following topics. Firstly, research on the various definitions of hate speech. This provided insight on how hate speech can be distinguished from normal speech, as well as the evolutions in language that make the identification of hate speech challenging. Secondly, research on the natural language processing (NLP) methodology. We examined existing studies on the use of NLP in hate speech detection to learn about popular and state-of-the-art methods used. Lastly, we studied benchmark datasets collated by the research community that could be used in our own experiments. Following this, we followed the NLP pipeline in introducing popular machine learning techniques for hate speech classification. This included text processing and representation techniques used to make text data understandable by models. Experiments were then conducted utilizing a mix of feature engineering and traditional machine learning classifiers to obtain a set of baseline classification metrics. Finally, deep learning frameworks such as neural networks and transformer models were then introduced in an attempt to outperform this benchmark. Additionally, we evaluated the performance of neural network-based models and pre-trained models and attempted to reach a conclusion on the most suitable strategies for detecting hate speech. Ultimately, this study provides a benchmarked assessment of existing deep learning techniques and their use in the field of hate speech detection, hoping to contribute valuable knowledge to the research community. By enhancing our understanding of the strengths and limitations of various models and pre-trained language models, it will contribute to the creation of a more inclusive and secure online community and serve as a significant step towards mitigating the adverse impact of hate speech on digital platforms. Bachelor's degree 2024-04-23T05:17:27Z 2024-04-23T05:17:27Z 2024 Final Year Project (FYP) Chang, T. Z. (2024). Deep learning techniques for hate speech detection. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175272 https://hdl.handle.net/10356/175272 en SCSE23-0732 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Chang, Timothy Zu'En
Deep learning techniques for hate speech detection
description With the rapid growth of the Internet and continuous expansion of online content, the proliferation of hate speech also increases. Hate speech has severe implications on social polarization, as well as physical and mental safety, warranting an urgent need for effective automated detection. In this study, we analyze current efforts by the scientific community in developing automated methods for detecting online hate speech. This led us to the discovery of machine learning-based approaches for automatic hate speech detection, in particular deep learning approaches which were popularized for their robustness and ability to learn newly evolving slang. In doing so, we also investigate the challenges faced, including language nuances, varying definitions of hate speech, and data constraints. This study aims to answer the following questions: How do we define and distinguish hate speech from other classes of speech? What is currently being done in the scientific community for its detection? Lastly, how effective are they in classifying hate speech? In this study, we conducted literary research on the following topics. Firstly, research on the various definitions of hate speech. This provided insight on how hate speech can be distinguished from normal speech, as well as the evolutions in language that make the identification of hate speech challenging. Secondly, research on the natural language processing (NLP) methodology. We examined existing studies on the use of NLP in hate speech detection to learn about popular and state-of-the-art methods used. Lastly, we studied benchmark datasets collated by the research community that could be used in our own experiments. Following this, we followed the NLP pipeline in introducing popular machine learning techniques for hate speech classification. This included text processing and representation techniques used to make text data understandable by models. Experiments were then conducted utilizing a mix of feature engineering and traditional machine learning classifiers to obtain a set of baseline classification metrics. Finally, deep learning frameworks such as neural networks and transformer models were then introduced in an attempt to outperform this benchmark. Additionally, we evaluated the performance of neural network-based models and pre-trained models and attempted to reach a conclusion on the most suitable strategies for detecting hate speech. Ultimately, this study provides a benchmarked assessment of existing deep learning techniques and their use in the field of hate speech detection, hoping to contribute valuable knowledge to the research community. By enhancing our understanding of the strengths and limitations of various models and pre-trained language models, it will contribute to the creation of a more inclusive and secure online community and serve as a significant step towards mitigating the adverse impact of hate speech on digital platforms.
author2 Luu Anh Tuan
author_facet Luu Anh Tuan
Chang, Timothy Zu'En
format Final Year Project
author Chang, Timothy Zu'En
author_sort Chang, Timothy Zu'En
title Deep learning techniques for hate speech detection
title_short Deep learning techniques for hate speech detection
title_full Deep learning techniques for hate speech detection
title_fullStr Deep learning techniques for hate speech detection
title_full_unstemmed Deep learning techniques for hate speech detection
title_sort deep learning techniques for hate speech detection
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/175272
_version_ 1800916301079642112