Comparative Analysis of Combinations of Dimension Reduction and Data Mining Techniques for Malware Detection

Many malware detectors utilize data mining techniques as primary tools for pattern recognition. As the number of new and evolving malware continues to rise, there is an increasing need for faster and more accurate detectors. However, for a given malware detector, detection speed and accuracy are usu...

Full description

Saved in:
Bibliographic Details
Main Authors: Fernandez, Proceso L, Jr, Yiu, Jeffrey C, Arana, Paul Albert R
Format: text
Published: Archīum Ateneo 2010
Subjects:
Online Access:https://archium.ateneo.edu/discs-faculty-pubs/198
https://archium.ateneo.edu/cgi/viewcontent.cgi?article=1197&context=discs-faculty-pubs
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
id ph-ateneo-arc.discs-faculty-pubs-1197
record_format eprints
spelling ph-ateneo-arc.discs-faculty-pubs-11972020-09-09T06:54:47Z Comparative Analysis of Combinations of Dimension Reduction and Data Mining Techniques for Malware Detection Fernandez, Proceso L, Jr Yiu, Jeffrey C Arana, Paul Albert R Many malware detectors utilize data mining techniques as primary tools for pattern recognition. As the number of new and evolving malware continues to rise, there is an increasing need for faster and more accurate detectors. However, for a given malware detector, detection speed and accuracy are usually inversely related. This study explores several configurations of classification combined with feature selection. An optimization function involving accuracy and processing time is used to evaluate each configuration. A real data set provided by Trend Micro Philippines is used for the study. Among 18 di↵erent configurations studied, it is shown that J4.8 without feature selection is best for cases where accuracy is extremely important. On the other hand, when time performance is more crucial, applying a Na¨ıve Bayes classifier on a reduced data set (using Gain Ratio Attribute Evaluation to select the top 35 features only) gives the best results. 2010-10-01T07:00:00Z text application/pdf https://archium.ateneo.edu/discs-faculty-pubs/198 https://archium.ateneo.edu/cgi/viewcontent.cgi?article=1197&context=discs-faculty-pubs Department of Information Systems & Computer Science Faculty Publications Archīum Ateneo Malware Detection Data Mining Dimension Reduction Feature Selection Classification Computer Sciences Databases and Information Systems
institution Ateneo De Manila University
building Ateneo De Manila University Library
country Philippines
collection archium.Ateneo Institutional Repository
topic Malware Detection
Data Mining
Dimension Reduction
Feature Selection
Classification
Computer Sciences
Databases and Information Systems
spellingShingle Malware Detection
Data Mining
Dimension Reduction
Feature Selection
Classification
Computer Sciences
Databases and Information Systems
Fernandez, Proceso L, Jr
Yiu, Jeffrey C
Arana, Paul Albert R
Comparative Analysis of Combinations of Dimension Reduction and Data Mining Techniques for Malware Detection
description Many malware detectors utilize data mining techniques as primary tools for pattern recognition. As the number of new and evolving malware continues to rise, there is an increasing need for faster and more accurate detectors. However, for a given malware detector, detection speed and accuracy are usually inversely related. This study explores several configurations of classification combined with feature selection. An optimization function involving accuracy and processing time is used to evaluate each configuration. A real data set provided by Trend Micro Philippines is used for the study. Among 18 di↵erent configurations studied, it is shown that J4.8 without feature selection is best for cases where accuracy is extremely important. On the other hand, when time performance is more crucial, applying a Na¨ıve Bayes classifier on a reduced data set (using Gain Ratio Attribute Evaluation to select the top 35 features only) gives the best results.
format text
author Fernandez, Proceso L, Jr
Yiu, Jeffrey C
Arana, Paul Albert R
author_facet Fernandez, Proceso L, Jr
Yiu, Jeffrey C
Arana, Paul Albert R
author_sort Fernandez, Proceso L, Jr
title Comparative Analysis of Combinations of Dimension Reduction and Data Mining Techniques for Malware Detection
title_short Comparative Analysis of Combinations of Dimension Reduction and Data Mining Techniques for Malware Detection
title_full Comparative Analysis of Combinations of Dimension Reduction and Data Mining Techniques for Malware Detection
title_fullStr Comparative Analysis of Combinations of Dimension Reduction and Data Mining Techniques for Malware Detection
title_full_unstemmed Comparative Analysis of Combinations of Dimension Reduction and Data Mining Techniques for Malware Detection
title_sort comparative analysis of combinations of dimension reduction and data mining techniques for malware detection
publisher Archīum Ateneo
publishDate 2010
url https://archium.ateneo.edu/discs-faculty-pubs/198
https://archium.ateneo.edu/cgi/viewcontent.cgi?article=1197&context=discs-faculty-pubs
_version_ 1681506837237596160