Improving spam detection on Twitter using deep learning
The advancement of technology in a modern era has allowed Internet users to access social media easily. However, the number of content polluters also known as spammers have increased rapidly over the years. Spammers attract Internet users’ attention by broadcasting unsolicited content repetitively o...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/148957 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The advancement of technology in a modern era has allowed Internet users to access social media easily. However, the number of content polluters also known as spammers have increased rapidly over the years. Spammers attract Internet users’ attention by broadcasting unsolicited content repetitively on social media platforms. Their actions have caused negative social experience for legitimate Internet users. As a result, spam detection models are required to deter social media spammers. The goal of spam detection is to automatically classify content such as tweets into spam or non-spam. Past studies have shown that the success of spam detection models was built by numerous types of machine learning and deep learning methods.
In this project, deep learning models such as LSTM, CNN, and Transformer were experimented on publicly available Twitter dataset. Strategic text processing techniques were performed on original dataset to create 3 modified datasets for experiment. Word embedding techniques such as Word2Vec model, pre-trained GloVe vectors, and random embedding weight initialisation were evaluated. Lastly, classification performances of LSTM, CNN, and Transformer were compared with related works. Experimental results have showed that LSTM with random embedding weight initialisation achieved the best spam precision and specificity scores of 80% and 87%, respectively. Furthermore, my LSTM experimental results have shown comparable performance to other related works. |
---|