Improving spam detection on Twitter using deep learning

The advancement of technology in a modern era has allowed Internet users to access social media easily. However, the number of content polluters also known as spammers have increased rapidly over the years. Spammers attract Internet users’ attention by broadcasting unsolicited content repetitively o...

Full description

Saved in:
Bibliographic Details
Main Author: Ng, Yi Rong
Other Authors: Ponnuthurai Nagaratnam Suganthan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/148957
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The advancement of technology in a modern era has allowed Internet users to access social media easily. However, the number of content polluters also known as spammers have increased rapidly over the years. Spammers attract Internet users’ attention by broadcasting unsolicited content repetitively on social media platforms. Their actions have caused negative social experience for legitimate Internet users. As a result, spam detection models are required to deter social media spammers. The goal of spam detection is to automatically classify content such as tweets into spam or non-spam. Past studies have shown that the success of spam detection models was built by numerous types of machine learning and deep learning methods. In this project, deep learning models such as LSTM, CNN, and Transformer were experimented on publicly available Twitter dataset. Strategic text processing techniques were performed on original dataset to create 3 modified datasets for experiment. Word embedding techniques such as Word2Vec model, pre-trained GloVe vectors, and random embedding weight initialisation were evaluated. Lastly, classification performances of LSTM, CNN, and Transformer were compared with related works. Experimental results have showed that LSTM with random embedding weight initialisation achieved the best spam precision and specificity scores of 80% and 87%, respectively. Furthermore, my LSTM experimental results have shown comparable performance to other related works.