GA-based feature subset selection in a spam/non-spam detection system

Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than...

Full description

Saved in:

Bibliographic Details
Main Authors:	Behjat, Amir Rajabi, Mustapha, Aida, Nezamabadi-pour, Hossein, Sulaiman, Md. Nasir, Mustapha, Norwati
Format:	Conference or Workshop Item
Language:	English
Published:	IEEE 2012
Online Access:	http://psasir.upm.edu.my/id/eprint/47692/1/GA-based%20feature%20subset%20selection%20in%20a%20spamnon-spam%20detection%20system.pdf http://psasir.upm.edu.my/id/eprint/47692/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Putra Malaysia
Language:	English

id	my.upm.eprints.47692
record_format	eprints
spelling	my.upm.eprints.476922016-07-14T04:47:30Z http://psasir.upm.edu.my/id/eprint/47692/ GA-based feature subset selection in a spam/non-spam detection system Behjat, Amir Rajabi Mustapha, Aida Nezamabadi-pour, Hossein Sulaiman, Md. Nasir Mustapha, Norwati Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than one hundred emails. On the other hand, from the feature selection perspective, one of the specific problems that decrease accuracy of spam and non-spam emails classification is high data dimensionality. Therefore, the reduction of dimensionality is related to decrease the number of irrelevant features. In this paper, a genetic algorithm (GA) is applied during feature selection in effort to decrease the number of useless features in a collection of high-dimensional email body and subject. Next, a Multi-Layer Perceptron (MLP) is employed to classify features that have been selected by the GA. Using LingSpam benchmark corpora as the dataset, the experimental results showed that a GA feature selector with the MLP classifier does not only decrease the data dimensionality but increase the spam detection rate as compared against other classifiers such as SVM and Naïve Bayes. IEEE 2012 Conference or Workshop Item PeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/47692/1/GA-based%20feature%20subset%20selection%20in%20a%20spamnon-spam%20detection%20system.pdf Behjat, Amir Rajabi and Mustapha, Aida and Nezamabadi-pour, Hossein and Sulaiman, Md. Nasir and Mustapha, Norwati (2012) GA-based feature subset selection in a spam/non-spam detection system. In: International Conference on Computer and Communication Engineering (ICCCE 2012), 3-5 July 2012, Kuala Lumpur, Malaysia. (pp. 675-679). 10.1109/ICCCE.2012.6271302
institution	Universiti Putra Malaysia
building	UPM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Putra Malaysia
content_source	UPM Institutional Repository
url_provider	http://psasir.upm.edu.my/
language	English
description	Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than one hundred emails. On the other hand, from the feature selection perspective, one of the specific problems that decrease accuracy of spam and non-spam emails classification is high data dimensionality. Therefore, the reduction of dimensionality is related to decrease the number of irrelevant features. In this paper, a genetic algorithm (GA) is applied during feature selection in effort to decrease the number of useless features in a collection of high-dimensional email body and subject. Next, a Multi-Layer Perceptron (MLP) is employed to classify features that have been selected by the GA. Using LingSpam benchmark corpora as the dataset, the experimental results showed that a GA feature selector with the MLP classifier does not only decrease the data dimensionality but increase the spam detection rate as compared against other classifiers such as SVM and Naïve Bayes.
format	Conference or Workshop Item
author	Behjat, Amir Rajabi Mustapha, Aida Nezamabadi-pour, Hossein Sulaiman, Md. Nasir Mustapha, Norwati
spellingShingle	Behjat, Amir Rajabi Mustapha, Aida Nezamabadi-pour, Hossein Sulaiman, Md. Nasir Mustapha, Norwati GA-based feature subset selection in a spam/non-spam detection system
author_facet	Behjat, Amir Rajabi Mustapha, Aida Nezamabadi-pour, Hossein Sulaiman, Md. Nasir Mustapha, Norwati
author_sort	Behjat, Amir Rajabi
title	GA-based feature subset selection in a spam/non-spam detection system
title_short	GA-based feature subset selection in a spam/non-spam detection system
title_full	GA-based feature subset selection in a spam/non-spam detection system
title_fullStr	GA-based feature subset selection in a spam/non-spam detection system
title_full_unstemmed	GA-based feature subset selection in a spam/non-spam detection system
title_sort	ga-based feature subset selection in a spam/non-spam detection system
publisher	IEEE
publishDate	2012
url	http://psasir.upm.edu.my/id/eprint/47692/1/GA-based%20feature%20subset%20selection%20in%20a%20spamnon-spam%20detection%20system.pdf http://psasir.upm.edu.my/id/eprint/47692/
_version_	1643833953918910464

GA-based feature subset selection in a spam/non-spam detection system

Similar Items