Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques

Junk mail or spam mail has been regarded as a major problem in today’s world. The spam mail can lead to cybercrime that impacts all individuals and organization.Many people and businesses seek for spam mail prevention technique in order to protect their own data and computer system.The spam mails no...

Full description

Saved in:
Bibliographic Details
Main Authors: Khumsong, Ployphailin, Chumwatana, Todsanai, Augsirikul, Supanit
Format: Conference or Workshop Item
Language:English
Published: 2016
Subjects:
Online Access:http://repo.uum.edu.my/20125/1/KMICe2016%20373%20378.pdf
http://repo.uum.edu.my/20125/
http://www.kmice.cms.net.my/kmice2016/files/KMICe2016_eproceeding.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Utara Malaysia
Language: English
id my.uum.repo.20125
record_format eprints
spelling my.uum.repo.201252016-11-30T08:17:33Z http://repo.uum.edu.my/20125/ Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques Khumsong, Ployphailin Chumwatana, Todsanai Augsirikul, Supanit QA Mathematics Junk mail or spam mail has been regarded as a major problem in today’s world. The spam mail can lead to cybercrime that impacts all individuals and organization.Many people and businesses seek for spam mail prevention technique in order to protect their own data and computer system.The spam mails normally contain advertise products or services contents and also conveys viruses, malwares, spywares and so forth.Many people thought spam mails do not cause any damage. In fact, the spam mails made a management cost increased and resources will be used ineffectively.Therefore, verifying and filtering spam mails need to be taken into consideration. The objective of this paper is to introduce the hybrid approach, which combines three techniques including stop-word removal, n-gram extraction and data classification, for filtering spam emails and simplifies system development.The proposed hybrid approach can be widely applied for all different languages due to being language independent technique. To examine the approach, CSDMC2010 spam mail corpus comprising of 198 common emails, 202 spam mails, and 10 selective emails were used in experimental study.The results showed that the proposed technique enabled to monitor whether the email is spam with 93.2% accuracy.Hence, this hybrid approach could provide benefits for all users and organization to decrease the computer risk. 2016-08-29 Conference or Workshop Item PeerReviewed application/pdf en http://repo.uum.edu.my/20125/1/KMICe2016%20373%20378.pdf Khumsong, Ployphailin and Chumwatana, Todsanai and Augsirikul, Supanit (2016) Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques. In: Knowledge Management International Conference (KMICe) 2016, 29 – 30 August 2016, Chiang Mai, Thailand. http://www.kmice.cms.net.my/kmice2016/files/KMICe2016_eproceeding.pdf
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutionali Repository
url_provider http://repo.uum.edu.my/
language English
topic QA Mathematics
spellingShingle QA Mathematics
Khumsong, Ployphailin
Chumwatana, Todsanai
Augsirikul, Supanit
Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques
description Junk mail or spam mail has been regarded as a major problem in today’s world. The spam mail can lead to cybercrime that impacts all individuals and organization.Many people and businesses seek for spam mail prevention technique in order to protect their own data and computer system.The spam mails normally contain advertise products or services contents and also conveys viruses, malwares, spywares and so forth.Many people thought spam mails do not cause any damage. In fact, the spam mails made a management cost increased and resources will be used ineffectively.Therefore, verifying and filtering spam mails need to be taken into consideration. The objective of this paper is to introduce the hybrid approach, which combines three techniques including stop-word removal, n-gram extraction and data classification, for filtering spam emails and simplifies system development.The proposed hybrid approach can be widely applied for all different languages due to being language independent technique. To examine the approach, CSDMC2010 spam mail corpus comprising of 198 common emails, 202 spam mails, and 10 selective emails were used in experimental study.The results showed that the proposed technique enabled to monitor whether the email is spam with 93.2% accuracy.Hence, this hybrid approach could provide benefits for all users and organization to decrease the computer risk.
format Conference or Workshop Item
author Khumsong, Ployphailin
Chumwatana, Todsanai
Augsirikul, Supanit
author_facet Khumsong, Ployphailin
Chumwatana, Todsanai
Augsirikul, Supanit
author_sort Khumsong, Ployphailin
title Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques
title_short Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques
title_full Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques
title_fullStr Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques
title_full_unstemmed Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques
title_sort filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques
publishDate 2016
url http://repo.uum.edu.my/20125/1/KMICe2016%20373%20378.pdf
http://repo.uum.edu.my/20125/
http://www.kmice.cms.net.my/kmice2016/files/KMICe2016_eproceeding.pdf
_version_ 1644282868464091136