A novel framework for identifying twitter spam data using machine learning algorithms

Nowadays, Twitter has become one of the most popular social media in the world. However, its popularity makes it an attractive platform for spammers to spread spam. Twitter spam becomes a severe issue. It is referred to as unsolicited tweets containing malicious links that direct victims to external...

Full description

Saved in:

Bibliographic Details
Main Authors:	Maziku, Susana Boniphace, Abdul Rahiman, Amir Rizaan, Muhammed, Abdullah, Abdullah @ Selimun, Mohd Taufik
Format:	Article
Language:	English
Published:	Science Press 2020
Online Access:	http://psasir.upm.edu.my/id/eprint/87624/1/ABSTRACT.pdf http://psasir.upm.edu.my/id/eprint/87624/ https://www.jsju.org/index.php/journal/article/view/712#:~:text=This%20study%20introduces%20a%20novel,information%20is%20the%20study's%20methods.
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Putra Malaysia
Language:	English

id	my.upm.eprints.87624
record_format	eprints
spelling	my.upm.eprints.876242022-07-06T05:04:20Z http://psasir.upm.edu.my/id/eprint/87624/ A novel framework for identifying twitter spam data using machine learning algorithms Maziku, Susana Boniphace Abdul Rahiman, Amir Rizaan Muhammed, Abdullah Abdullah @ Selimun, Mohd Taufik Nowadays, Twitter has become one of the most popular social media in the world. However, its popularity makes it an attractive platform for spammers to spread spam. Twitter spam becomes a severe issue. It is referred to as unsolicited tweets containing malicious links that direct victims to external sites containing malware downloads, terrorists, phishing, drug sales, scams, etc. Previous studies have approached spam detection as a classification problem, high dimension, time-consuming problem, which requires new methods to address the problems. This study introduces a novel framework for identifying Twitter spam data based on machine learning algorithms. By initializing data pre-processing for clean-up, noise removal, and unpredictable unfinished data, reducing the number of features in the tweet dataset using mutual information is the study's methods. The feature selection is introduced to select the most important from the extracted high-dimensional best features and feed the selected features into the minimum Redundancy and Maximal Relevance algorithm and apply random forest for classification. This study allows us to achieve higher classification accuracy and speed. The effectiveness evaluation being confirmed by experiment results show that accuracy is improved by 90% in 0hr 0m 20s time, compared with the existing system, the completion time is 2.022 seconds, and the accuracy is 80%. The research results contribute significantly to the field of cyber-security by forming a real-time system using machine learning algorithms. Science Press 2020 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/87624/1/ABSTRACT.pdf Maziku, Susana Boniphace and Abdul Rahiman, Amir Rizaan and Muhammed, Abdullah and Abdullah @ Selimun, Mohd Taufik (2020) A novel framework for identifying twitter spam data using machine learning algorithms. Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, 55 (5). pp. 1-20. ISSN 0258-2724 https://www.jsju.org/index.php/journal/article/view/712#:~:text=This%20study%20introduces%20a%20novel,information%20is%20the%20study's%20methods. 10.35741/issn.0258-2724.55.5.1
institution	Universiti Putra Malaysia
building	UPM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Putra Malaysia
content_source	UPM Institutional Repository
url_provider	http://psasir.upm.edu.my/
language	English
description	Nowadays, Twitter has become one of the most popular social media in the world. However, its popularity makes it an attractive platform for spammers to spread spam. Twitter spam becomes a severe issue. It is referred to as unsolicited tweets containing malicious links that direct victims to external sites containing malware downloads, terrorists, phishing, drug sales, scams, etc. Previous studies have approached spam detection as a classification problem, high dimension, time-consuming problem, which requires new methods to address the problems. This study introduces a novel framework for identifying Twitter spam data based on machine learning algorithms. By initializing data pre-processing for clean-up, noise removal, and unpredictable unfinished data, reducing the number of features in the tweet dataset using mutual information is the study's methods. The feature selection is introduced to select the most important from the extracted high-dimensional best features and feed the selected features into the minimum Redundancy and Maximal Relevance algorithm and apply random forest for classification. This study allows us to achieve higher classification accuracy and speed. The effectiveness evaluation being confirmed by experiment results show that accuracy is improved by 90% in 0hr 0m 20s time, compared with the existing system, the completion time is 2.022 seconds, and the accuracy is 80%. The research results contribute significantly to the field of cyber-security by forming a real-time system using machine learning algorithms.
format	Article
author	Maziku, Susana Boniphace Abdul Rahiman, Amir Rizaan Muhammed, Abdullah Abdullah @ Selimun, Mohd Taufik
spellingShingle	Maziku, Susana Boniphace Abdul Rahiman, Amir Rizaan Muhammed, Abdullah Abdullah @ Selimun, Mohd Taufik A novel framework for identifying twitter spam data using machine learning algorithms
author_facet	Maziku, Susana Boniphace Abdul Rahiman, Amir Rizaan Muhammed, Abdullah Abdullah @ Selimun, Mohd Taufik
author_sort	Maziku, Susana Boniphace
title	A novel framework for identifying twitter spam data using machine learning algorithms
title_short	A novel framework for identifying twitter spam data using machine learning algorithms
title_full	A novel framework for identifying twitter spam data using machine learning algorithms
title_fullStr	A novel framework for identifying twitter spam data using machine learning algorithms
title_full_unstemmed	A novel framework for identifying twitter spam data using machine learning algorithms
title_sort	novel framework for identifying twitter spam data using machine learning algorithms
publisher	Science Press
publishDate	2020
url	http://psasir.upm.edu.my/id/eprint/87624/1/ABSTRACT.pdf http://psasir.upm.edu.my/id/eprint/87624/ https://www.jsju.org/index.php/journal/article/view/712#:~:text=This%20study%20introduces%20a%20novel,information%20is%20the%20study's%20methods.
_version_	1738511961852739584

A novel framework for identifying twitter spam data using machine learning algorithms

Similar Items