Comparative Analysis of Combinations of Dimension Reduction and Data Mining Techniques for Malware Detection

Many malware detectors utilize data mining techniques as primary tools for pattern recognition. As the number of new and evolving malware continues to rise, there is an increasing need for faster and more accurate detectors. However, for a given malware detector, detection speed and accuracy are usu...

Full description

Saved in:
Bibliographic Details
Main Authors: Fernandez, Proceso L, Jr, Yiu, Jeffrey C, Arana, Paul Albert R
Format: text
Published: Archīum Ateneo 2010
Subjects:
Online Access:https://archium.ateneo.edu/discs-faculty-pubs/198
https://archium.ateneo.edu/cgi/viewcontent.cgi?article=1197&context=discs-faculty-pubs
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
Description
Summary:Many malware detectors utilize data mining techniques as primary tools for pattern recognition. As the number of new and evolving malware continues to rise, there is an increasing need for faster and more accurate detectors. However, for a given malware detector, detection speed and accuracy are usually inversely related. This study explores several configurations of classification combined with feature selection. An optimization function involving accuracy and processing time is used to evaluate each configuration. A real data set provided by Trend Micro Philippines is used for the study. Among 18 di↵erent configurations studied, it is shown that J4.8 without feature selection is best for cases where accuracy is extremely important. On the other hand, when time performance is more crucial, applying a Na¨ıve Bayes classifier on a reduced data set (using Gain Ratio Attribute Evaluation to select the top 35 features only) gives the best results.