On the effects of de-obfuscation on spam detection accuracy

Spam contributes to approximately two-thirds of the e-mail traffic over the Internet [4] and is fast becoming a major problem for IT users and network administrators. Spam costs billions in lost productivity [13] and results in more problems than mere annoyance of delayed and lost non-spam emai...

Full description

Saved in:
Bibliographic Details
Main Authors: M. E. Rafiq, A. Newaz, Marsono, Muhammad Nadzir, Gebali, Fayez
Format: Book Section
Published: Penerbit UTM 2007
Subjects:
Online Access:http://eprints.utm.my/id/eprint/13680/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
id my.utm.13680
record_format eprints
spelling my.utm.136802017-10-08T01:13:15Z http://eprints.utm.my/id/eprint/13680/ On the effects of de-obfuscation on spam detection accuracy M. E. Rafiq, A. Newaz Marsono, Muhammad Nadzir Gebali, Fayez TK Electrical engineering. Electronics Nuclear engineering Spam contributes to approximately two-thirds of the e-mail traffic over the Internet [4] and is fast becoming a major problem for IT users and network administrators. Spam costs billions in lost productivity [13] and results in more problems than mere annoyance of delayed and lost non-spam emails. Naive Bayes classification has widely been used for spam detection and several variations have been proposed [19], [1], [5]. In e-mail content classification (as other supervised-learning techniques), the accuracy (of spam detection) depends on the frequency of spam features observed during training. Spam continuously evolves to circumvent systems and is becoming much more sophisticated [6]. Spammers obfuscate wellknown spam features in different ways to circumvent spam detection [12]. Obfuscating spam features (even by substituting a character with a visually similar one) reduces the frequency and size of features observed during learning. Hence, if obfuscated spam features can be de-obfuscated first before the detection, then the accuracy of spam detection would increase. This statement is proved in this chapter by experimenting with real spam e-mails. Penerbit UTM 2007 Book Section PeerReviewed M. E. Rafiq, A. Newaz and Marsono, Muhammad Nadzir and Gebali, Fayez (2007) On the effects of de-obfuscation on spam detection accuracy. In: Advances In Digital Signal Processing Applications. Penerbit UTM , Johor, pp. 159-172. ISBN 978-983-52-0652-8
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic TK Electrical engineering. Electronics Nuclear engineering
spellingShingle TK Electrical engineering. Electronics Nuclear engineering
M. E. Rafiq, A. Newaz
Marsono, Muhammad Nadzir
Gebali, Fayez
On the effects of de-obfuscation on spam detection accuracy
description Spam contributes to approximately two-thirds of the e-mail traffic over the Internet [4] and is fast becoming a major problem for IT users and network administrators. Spam costs billions in lost productivity [13] and results in more problems than mere annoyance of delayed and lost non-spam emails. Naive Bayes classification has widely been used for spam detection and several variations have been proposed [19], [1], [5]. In e-mail content classification (as other supervised-learning techniques), the accuracy (of spam detection) depends on the frequency of spam features observed during training. Spam continuously evolves to circumvent systems and is becoming much more sophisticated [6]. Spammers obfuscate wellknown spam features in different ways to circumvent spam detection [12]. Obfuscating spam features (even by substituting a character with a visually similar one) reduces the frequency and size of features observed during learning. Hence, if obfuscated spam features can be de-obfuscated first before the detection, then the accuracy of spam detection would increase. This statement is proved in this chapter by experimenting with real spam e-mails.
format Book Section
author M. E. Rafiq, A. Newaz
Marsono, Muhammad Nadzir
Gebali, Fayez
author_facet M. E. Rafiq, A. Newaz
Marsono, Muhammad Nadzir
Gebali, Fayez
author_sort M. E. Rafiq, A. Newaz
title On the effects of de-obfuscation on spam detection accuracy
title_short On the effects of de-obfuscation on spam detection accuracy
title_full On the effects of de-obfuscation on spam detection accuracy
title_fullStr On the effects of de-obfuscation on spam detection accuracy
title_full_unstemmed On the effects of de-obfuscation on spam detection accuracy
title_sort on the effects of de-obfuscation on spam detection accuracy
publisher Penerbit UTM
publishDate 2007
url http://eprints.utm.my/id/eprint/13680/
_version_ 1643646252625166336