Study on rough set and chi square statistic feature selection for spam classification

64 p.

Saved in:
Bibliographic Details
Main Author: Juniarto Samsudin.
Other Authors: Zhong Zhaowei
Format: Theses and Dissertations
Published: 2010
Subjects:
Online Access:http://hdl.handle.net/10356/35982
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
id sg-ntu-dr.10356-35982
record_format dspace
spelling sg-ntu-dr.10356-359822023-03-11T17:06:55Z Study on rough set and chi square statistic feature selection for spam classification Juniarto Samsudin. Zhong Zhaowei School of Mechanical and Aerospace Engineering DRNTU::Engineering::Systems engineering 64 p. Spam messages waste time and resources to the recipients. This dissertation presents the effectiveness of feature selections, particularly,rough set and chi square statistic feature selection methods in combination with J48 decision tree classifier for e-mail classification. Experiments were performed on SpamAssassin corpus, with features selected using word's age, chi square statistic and rough set attribute reduction. Performance is measured based on 10 fold cross validation in terms of Area Under Receiving Operating Characteristic Curve (AUC), precision and recall. The results show feature selection not only can improve the performance of the classifier, but also is a very essential step in e-mail classification. The experiments also reveal that e-mail messages contain a great deal of noise and bad features, which should be removed to increase the performance of the classifier. Master of Science (Smart Product Design) 2010-04-23T02:21:46Z 2010-04-23T02:21:46Z 2007 2007 Thesis http://hdl.handle.net/10356/35982 application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
topic DRNTU::Engineering::Systems engineering
spellingShingle DRNTU::Engineering::Systems engineering
Juniarto Samsudin.
Study on rough set and chi square statistic feature selection for spam classification
description 64 p.
author2 Zhong Zhaowei
author_facet Zhong Zhaowei
Juniarto Samsudin.
format Theses and Dissertations
author Juniarto Samsudin.
author_sort Juniarto Samsudin.
title Study on rough set and chi square statistic feature selection for spam classification
title_short Study on rough set and chi square statistic feature selection for spam classification
title_full Study on rough set and chi square statistic feature selection for spam classification
title_fullStr Study on rough set and chi square statistic feature selection for spam classification
title_full_unstemmed Study on rough set and chi square statistic feature selection for spam classification
title_sort study on rough set and chi square statistic feature selection for spam classification
publishDate 2010
url http://hdl.handle.net/10356/35982
_version_ 1761781641635692544