An ensemble learning method for spam email detection system based on metaheuristic algorithms
In email spam detection, not only different parts and content of emails are important, but also the structural and special features of these emails have effective rule in dimensionality reduction and classifier accuracy. For example,the spammer changes patterns of message for making spam such as wri...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2015
|
Online Access: | http://psasir.upm.edu.my/id/eprint/65264/1/FSKTM%202015%2049IR.pdf http://psasir.upm.edu.my/id/eprint/65264/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Putra Malaysia |
Language: | English |
id |
my.upm.eprints.65264 |
---|---|
record_format |
eprints |
spelling |
my.upm.eprints.652642018-09-04T01:36:20Z http://psasir.upm.edu.my/id/eprint/65264/ An ensemble learning method for spam email detection system based on metaheuristic algorithms Behjat, Amir Rajabi In email spam detection, not only different parts and content of emails are important, but also the structural and special features of these emails have effective rule in dimensionality reduction and classifier accuracy. For example,the spammer changes patterns of message for making spam such as writing the message by JavaScript, using different advertising images and words to form features or attributes. Even the smart people are unable to report an email as a spam when the spammer tries to defraud them. The aim of data mining is to search and find undetermined patterns in huge databases. A well known task is classification that predicts the class of new instances using known features or attributes automatically. Major problems in classification task are large amount of training data, large number of features and different behavior of data streams that reduce accuracy and increase computational cost in classifier training phase. Feature subset selection and classifier ensemble learning are familiar techniques with high ability to optimize above problems. Recently, various techniques based on different algorithms have been developed. However, the classification accuracy and computational cost are not satisfied. In order to address the challenges that mentioned above in this study, in the first phase, a novel architecture based on ensemble feature selection techniques include Modified Binary Bat Algorithm (NBBA), Binary Quantum Particle Swarm Optimization (QBPSO) Algorithm and Binary Quantum Gravita tional Search Algorithm (QBGSA) is hybridized with the Multi-layer Perceptron (MLP) classifier in order to select relevant feature subsets and improve classification accuracy. In the second phase, a classifier ensemble learning model is proposed consisting of separate outputs: (i) To select a relevant subset of original features based on Binary Quantum Gravitational Search Algorithm (QBGSA), (ii) To mine data streams using various data chunks and overcome a failure of single classifiers based on SVM, MLP and K-NN algorithms. An experimental analysis is conducted by several experiments to evaluate the performance of the proposed ensemble methods which has been tested on the 4 benchmark datasets, namely LingSpam, SpamAssassin, Spambase and CSDMC2010. In comparison to different single algorithms for feature selection,experimental results show that the proposed ensemble method is able to reduce dimensionality, the number of irrelevant features and produce reasonable classifier accuracy. Experiments demonstrate that ensemble classifier learning method produces better accuracy mining data streams and selecting subset of relevant features comparing other single classifiers. In addition, experiments prove that the ensemble algorithms select highly relevant features to feed the MLP comparing individual techniques in terms of classifier performance through lower false positive, higher accuracy, and better CPU time. 2015-06 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/65264/1/FSKTM%202015%2049IR.pdf Behjat, Amir Rajabi (2015) An ensemble learning method for spam email detection system based on metaheuristic algorithms. PhD thesis, Universiti Putra Malaysia. |
institution |
Universiti Putra Malaysia |
building |
UPM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Putra Malaysia |
content_source |
UPM Institutional Repository |
url_provider |
http://psasir.upm.edu.my/ |
language |
English |
description |
In email spam detection, not only different parts and content of emails are important, but also the structural and special features of these emails have effective rule in dimensionality reduction and classifier accuracy. For example,the spammer changes patterns of message for making spam such as writing the message by JavaScript, using different advertising images and words to
form features or attributes. Even the smart people are unable to report an email as a spam when the spammer tries to defraud them.
The aim of data mining is to search and find undetermined patterns in huge databases. A well known task is classification that predicts the class of new instances using known features or attributes automatically. Major problems in classification task are large amount of training data, large number of features and different behavior of data streams that reduce accuracy and increase computational cost in classifier training phase. Feature subset selection and classifier ensemble learning are familiar techniques with high ability to optimize
above problems. Recently, various techniques based on different algorithms have been developed. However, the classification accuracy and computational cost are not satisfied.
In order to address the challenges that mentioned above in this study, in the first phase, a novel architecture based on ensemble feature selection techniques include Modified Binary Bat Algorithm (NBBA), Binary Quantum Particle Swarm Optimization (QBPSO) Algorithm and Binary Quantum Gravita tional Search Algorithm (QBGSA) is hybridized with the Multi-layer Perceptron (MLP) classifier in order to select relevant feature subsets and improve classification accuracy. In the second phase, a classifier ensemble learning model is proposed consisting of separate outputs: (i) To select a relevant
subset of original features based on Binary Quantum Gravitational Search Algorithm (QBGSA), (ii) To mine data streams using various data chunks and overcome a failure of single classifiers based on SVM, MLP and K-NN
algorithms.
An experimental analysis is conducted by several experiments to evaluate the performance of the proposed ensemble methods which has been tested on the
4 benchmark datasets, namely LingSpam, SpamAssassin, Spambase and CSDMC2010. In comparison to different single algorithms for feature selection,experimental results show that the proposed ensemble method is able to reduce
dimensionality, the number of irrelevant features and produce reasonable classifier accuracy. Experiments demonstrate that ensemble classifier learning method produces better accuracy mining data streams and selecting subset of relevant features comparing other single classifiers.
In addition, experiments prove that the ensemble algorithms select highly relevant features to feed the MLP comparing individual techniques in terms of classifier performance through lower false positive, higher accuracy, and better CPU time. |
format |
Thesis |
author |
Behjat, Amir Rajabi |
spellingShingle |
Behjat, Amir Rajabi An ensemble learning method for spam email detection system based on metaheuristic algorithms |
author_facet |
Behjat, Amir Rajabi |
author_sort |
Behjat, Amir Rajabi |
title |
An ensemble learning method for spam email detection system based on metaheuristic algorithms |
title_short |
An ensemble learning method for spam email detection system based on metaheuristic algorithms |
title_full |
An ensemble learning method for spam email detection system based on metaheuristic algorithms |
title_fullStr |
An ensemble learning method for spam email detection system based on metaheuristic algorithms |
title_full_unstemmed |
An ensemble learning method for spam email detection system based on metaheuristic algorithms |
title_sort |
ensemble learning method for spam email detection system based on metaheuristic algorithms |
publishDate |
2015 |
url |
http://psasir.upm.edu.my/id/eprint/65264/1/FSKTM%202015%2049IR.pdf http://psasir.upm.edu.my/id/eprint/65264/ |
_version_ |
1643838264282447872 |