Youtube spam detection framework using naïve bayes and logistic regression

YouTube has become a popular social media among the users. Due to YouTube popularity, it became a platform for spammer to distribute spam through the comments on YouTube. This has become a concern because spam can lead to phishing attack which the target can be any user that click any malicious link...

Full description

Saved in:
Bibliographic Details
Main Authors: Nur’Ain Maulat, Samsudin, Cik Feresa, Mohd Foozy, Nabilah, Alias, Palaniappan, Shamala, Nur Fadzilah, Othman, Wan Isni Sofiah, Wan Din
Format: Article
Language:English
Published: Institute of Advanced Engineering and Science 2019
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/25114/1/Youtube%20spam%20detection%20framework%20using%20na%C3%AFve%20bayes.pdf
http://umpir.ump.edu.my/id/eprint/25114/
http://ijeecs.iaescore.com/index.php/IJEECS/article/view/18468
http://doi.org/10.11591/ijeecs.v14.i3.pp1508-1517
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang
Language: English
id my.ump.umpir.25114
record_format eprints
spelling my.ump.umpir.251142019-06-25T07:24:02Z http://umpir.ump.edu.my/id/eprint/25114/ Youtube spam detection framework using naïve bayes and logistic regression Nur’Ain Maulat, Samsudin Cik Feresa, Mohd Foozy Nabilah, Alias Palaniappan, Shamala Nur Fadzilah, Othman Wan Isni Sofiah, Wan Din HE Transportation and Communications Q Science (General) TK Electrical engineering. Electronics Nuclear engineering YouTube has become a popular social media among the users. Due to YouTube popularity, it became a platform for spammer to distribute spam through the comments on YouTube. This has become a concern because spam can lead to phishing attack which the target can be any user that click any malicious link. Spam has its own features that can be analyzed and detected by classification. Hence, enhancement features are proposed to detect YouTube spam. In order to conduct the experiments, a YouTube Spam detection framework that consists of five (5) phases such as data collection, pre-processing, features selection and extraction, classification and detection were developed. This paper, proposed the YouTube detection framework, examined and validate each of the phases by using two types of data mining tool. The features are constructed from analysis by using data collected from YouTube Spam dataset by using Naïve Bayes and Logistic Regression and tested in two different data mining tools which is Weka and Rapid Miner. From the analysis, thirteen (13) features that had been tested on Weka and RapidMiner shows high accuracy, hence is being used throughout the experiment in this research. Result of Naïve Bayes and Logistic Regression run in Weka is slightly higher than RapidMiner. In addition, result of Naïve Bayes is higher than Logistic Regression with 87.21% and 85.29% respectively in Weka. While in RapidMiner there is slightly different of accuracy between Naïve Bayes and Logistic Regression 80.41% and 80.88%. But, precision of Naïve Bayes is higher than Logistic Regression. Institute of Advanced Engineering and Science 2019-06 Article PeerReviewed pdf en cc_by_nc_4 http://umpir.ump.edu.my/id/eprint/25114/1/Youtube%20spam%20detection%20framework%20using%20na%C3%AFve%20bayes.pdf Nur’Ain Maulat, Samsudin and Cik Feresa, Mohd Foozy and Nabilah, Alias and Palaniappan, Shamala and Nur Fadzilah, Othman and Wan Isni Sofiah, Wan Din (2019) Youtube spam detection framework using naïve bayes and logistic regression. Indonesian Journal of Electrical Engineering and Computer Science, 14 (3). pp. 1508-1517. ISSN 2502-4752 http://ijeecs.iaescore.com/index.php/IJEECS/article/view/18468 http://doi.org/10.11591/ijeecs.v14.i3.pp1508-1517
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic HE Transportation and Communications
Q Science (General)
TK Electrical engineering. Electronics Nuclear engineering
spellingShingle HE Transportation and Communications
Q Science (General)
TK Electrical engineering. Electronics Nuclear engineering
Nur’Ain Maulat, Samsudin
Cik Feresa, Mohd Foozy
Nabilah, Alias
Palaniappan, Shamala
Nur Fadzilah, Othman
Wan Isni Sofiah, Wan Din
Youtube spam detection framework using naïve bayes and logistic regression
description YouTube has become a popular social media among the users. Due to YouTube popularity, it became a platform for spammer to distribute spam through the comments on YouTube. This has become a concern because spam can lead to phishing attack which the target can be any user that click any malicious link. Spam has its own features that can be analyzed and detected by classification. Hence, enhancement features are proposed to detect YouTube spam. In order to conduct the experiments, a YouTube Spam detection framework that consists of five (5) phases such as data collection, pre-processing, features selection and extraction, classification and detection were developed. This paper, proposed the YouTube detection framework, examined and validate each of the phases by using two types of data mining tool. The features are constructed from analysis by using data collected from YouTube Spam dataset by using Naïve Bayes and Logistic Regression and tested in two different data mining tools which is Weka and Rapid Miner. From the analysis, thirteen (13) features that had been tested on Weka and RapidMiner shows high accuracy, hence is being used throughout the experiment in this research. Result of Naïve Bayes and Logistic Regression run in Weka is slightly higher than RapidMiner. In addition, result of Naïve Bayes is higher than Logistic Regression with 87.21% and 85.29% respectively in Weka. While in RapidMiner there is slightly different of accuracy between Naïve Bayes and Logistic Regression 80.41% and 80.88%. But, precision of Naïve Bayes is higher than Logistic Regression.
format Article
author Nur’Ain Maulat, Samsudin
Cik Feresa, Mohd Foozy
Nabilah, Alias
Palaniappan, Shamala
Nur Fadzilah, Othman
Wan Isni Sofiah, Wan Din
author_facet Nur’Ain Maulat, Samsudin
Cik Feresa, Mohd Foozy
Nabilah, Alias
Palaniappan, Shamala
Nur Fadzilah, Othman
Wan Isni Sofiah, Wan Din
author_sort Nur’Ain Maulat, Samsudin
title Youtube spam detection framework using naïve bayes and logistic regression
title_short Youtube spam detection framework using naïve bayes and logistic regression
title_full Youtube spam detection framework using naïve bayes and logistic regression
title_fullStr Youtube spam detection framework using naïve bayes and logistic regression
title_full_unstemmed Youtube spam detection framework using naïve bayes and logistic regression
title_sort youtube spam detection framework using naïve bayes and logistic regression
publisher Institute of Advanced Engineering and Science
publishDate 2019
url http://umpir.ump.edu.my/id/eprint/25114/1/Youtube%20spam%20detection%20framework%20using%20na%C3%AFve%20bayes.pdf
http://umpir.ump.edu.my/id/eprint/25114/
http://ijeecs.iaescore.com/index.php/IJEECS/article/view/18468
http://doi.org/10.11591/ijeecs.v14.i3.pp1508-1517
_version_ 1643669978996539392