Filtering spam mail in non-segmented languages using hybrid approach: the integration of stopword removal, n-gram extraction and classification techniques

Junk mail or spam mail has been regarded as a major problem in today’s world. The spam mail can lead to cybercrime that impacts all individuals and organization.Many people and businesses seek for spam mail prevention technique in order to protect their own data and computer system.The spam mails no...

Full description

Saved in:
Bibliographic Details
Main Authors: Khumsong, Ployphailin, Chumwatana, Todsanai, Augsirikul, Supanit
Format: Conference or Workshop Item
Language:English
Published: 2016
Subjects:
Online Access:http://repo.uum.edu.my/20125/1/KMICe2016%20373%20378.pdf
http://repo.uum.edu.my/20125/
http://www.kmice.cms.net.my/kmice2016/files/KMICe2016_eproceeding.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Utara Malaysia
Language: English
Description
Summary:Junk mail or spam mail has been regarded as a major problem in today’s world. The spam mail can lead to cybercrime that impacts all individuals and organization.Many people and businesses seek for spam mail prevention technique in order to protect their own data and computer system.The spam mails normally contain advertise products or services contents and also conveys viruses, malwares, spywares and so forth.Many people thought spam mails do not cause any damage. In fact, the spam mails made a management cost increased and resources will be used ineffectively.Therefore, verifying and filtering spam mails need to be taken into consideration. The objective of this paper is to introduce the hybrid approach, which combines three techniques including stop-word removal, n-gram extraction and data classification, for filtering spam emails and simplifies system development.The proposed hybrid approach can be widely applied for all different languages due to being language independent technique. To examine the approach, CSDMC2010 spam mail corpus comprising of 198 common emails, 202 spam mails, and 10 selective emails were used in experimental study.The results showed that the proposed technique enabled to monitor whether the email is spam with 93.2% accuracy.Hence, this hybrid approach could provide benefits for all users and organization to decrease the computer risk.