An ensemble learning method for spam email detection system based on metaheuristic algorithms

In email spam detection, not only different parts and content of emails are important, but also the structural and special features of these emails have effective rule in dimensionality reduction and classifier accuracy. For example,the spammer changes patterns of message for making spam such as wri...

Full description

Saved in:
Bibliographic Details
Main Author: Behjat, Amir Rajabi
Format: Thesis
Language:English
Published: 2015
Online Access:http://psasir.upm.edu.my/id/eprint/65264/1/FSKTM%202015%2049IR.pdf
http://psasir.upm.edu.my/id/eprint/65264/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
id my.upm.eprints.65264
record_format eprints
spelling my.upm.eprints.652642018-09-04T01:36:20Z http://psasir.upm.edu.my/id/eprint/65264/ An ensemble learning method for spam email detection system based on metaheuristic algorithms Behjat, Amir Rajabi In email spam detection, not only different parts and content of emails are important, but also the structural and special features of these emails have effective rule in dimensionality reduction and classifier accuracy. For example,the spammer changes patterns of message for making spam such as writing the message by JavaScript, using different advertising images and words to form features or attributes. Even the smart people are unable to report an email as a spam when the spammer tries to defraud them. The aim of data mining is to search and find undetermined patterns in huge databases. A well known task is classification that predicts the class of new instances using known features or attributes automatically. Major problems in classification task are large amount of training data, large number of features and different behavior of data streams that reduce accuracy and increase computational cost in classifier training phase. Feature subset selection and classifier ensemble learning are familiar techniques with high ability to optimize above problems. Recently, various techniques based on different algorithms have been developed. However, the classification accuracy and computational cost are not satisfied. In order to address the challenges that mentioned above in this study, in the first phase, a novel architecture based on ensemble feature selection techniques include Modified Binary Bat Algorithm (NBBA), Binary Quantum Particle Swarm Optimization (QBPSO) Algorithm and Binary Quantum Gravita tional Search Algorithm (QBGSA) is hybridized with the Multi-layer Perceptron (MLP) classifier in order to select relevant feature subsets and improve classification accuracy. In the second phase, a classifier ensemble learning model is proposed consisting of separate outputs: (i) To select a relevant subset of original features based on Binary Quantum Gravitational Search Algorithm (QBGSA), (ii) To mine data streams using various data chunks and overcome a failure of single classifiers based on SVM, MLP and K-NN algorithms. An experimental analysis is conducted by several experiments to evaluate the performance of the proposed ensemble methods which has been tested on the 4 benchmark datasets, namely LingSpam, SpamAssassin, Spambase and CSDMC2010. In comparison to different single algorithms for feature selection,experimental results show that the proposed ensemble method is able to reduce dimensionality, the number of irrelevant features and produce reasonable classifier accuracy. Experiments demonstrate that ensemble classifier learning method produces better accuracy mining data streams and selecting subset of relevant features comparing other single classifiers. In addition, experiments prove that the ensemble algorithms select highly relevant features to feed the MLP comparing individual techniques in terms of classifier performance through lower false positive, higher accuracy, and better CPU time. 2015-06 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/65264/1/FSKTM%202015%2049IR.pdf Behjat, Amir Rajabi (2015) An ensemble learning method for spam email detection system based on metaheuristic algorithms. PhD thesis, Universiti Putra Malaysia.
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description In email spam detection, not only different parts and content of emails are important, but also the structural and special features of these emails have effective rule in dimensionality reduction and classifier accuracy. For example,the spammer changes patterns of message for making spam such as writing the message by JavaScript, using different advertising images and words to form features or attributes. Even the smart people are unable to report an email as a spam when the spammer tries to defraud them. The aim of data mining is to search and find undetermined patterns in huge databases. A well known task is classification that predicts the class of new instances using known features or attributes automatically. Major problems in classification task are large amount of training data, large number of features and different behavior of data streams that reduce accuracy and increase computational cost in classifier training phase. Feature subset selection and classifier ensemble learning are familiar techniques with high ability to optimize above problems. Recently, various techniques based on different algorithms have been developed. However, the classification accuracy and computational cost are not satisfied. In order to address the challenges that mentioned above in this study, in the first phase, a novel architecture based on ensemble feature selection techniques include Modified Binary Bat Algorithm (NBBA), Binary Quantum Particle Swarm Optimization (QBPSO) Algorithm and Binary Quantum Gravita tional Search Algorithm (QBGSA) is hybridized with the Multi-layer Perceptron (MLP) classifier in order to select relevant feature subsets and improve classification accuracy. In the second phase, a classifier ensemble learning model is proposed consisting of separate outputs: (i) To select a relevant subset of original features based on Binary Quantum Gravitational Search Algorithm (QBGSA), (ii) To mine data streams using various data chunks and overcome a failure of single classifiers based on SVM, MLP and K-NN algorithms. An experimental analysis is conducted by several experiments to evaluate the performance of the proposed ensemble methods which has been tested on the 4 benchmark datasets, namely LingSpam, SpamAssassin, Spambase and CSDMC2010. In comparison to different single algorithms for feature selection,experimental results show that the proposed ensemble method is able to reduce dimensionality, the number of irrelevant features and produce reasonable classifier accuracy. Experiments demonstrate that ensemble classifier learning method produces better accuracy mining data streams and selecting subset of relevant features comparing other single classifiers. In addition, experiments prove that the ensemble algorithms select highly relevant features to feed the MLP comparing individual techniques in terms of classifier performance through lower false positive, higher accuracy, and better CPU time.
format Thesis
author Behjat, Amir Rajabi
spellingShingle Behjat, Amir Rajabi
An ensemble learning method for spam email detection system based on metaheuristic algorithms
author_facet Behjat, Amir Rajabi
author_sort Behjat, Amir Rajabi
title An ensemble learning method for spam email detection system based on metaheuristic algorithms
title_short An ensemble learning method for spam email detection system based on metaheuristic algorithms
title_full An ensemble learning method for spam email detection system based on metaheuristic algorithms
title_fullStr An ensemble learning method for spam email detection system based on metaheuristic algorithms
title_full_unstemmed An ensemble learning method for spam email detection system based on metaheuristic algorithms
title_sort ensemble learning method for spam email detection system based on metaheuristic algorithms
publishDate 2015
url http://psasir.upm.edu.my/id/eprint/65264/1/FSKTM%202015%2049IR.pdf
http://psasir.upm.edu.my/id/eprint/65264/
_version_ 1643838264282447872