Performance-oriented and sustainability-oriented design of an effective android malware detector
Effective Android malware detection is a complex problem because of the rapidly-evolving, complicated, and diverse nature of malware. The design of malware detectors should prioritise high detection rate, efficient use of computational resources, and sustainability. Keeping these design priorities...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE Access
2024
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/115657/1/115657_Performance-Oriented%20and%20Sustainability-Oriented.pdf http://irep.iium.edu.my/115657/ https://ieeexplore.ieee.org/abstract/document/10734122 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Islam Antarabangsa Malaysia |
Language: | English |
Summary: | Effective Android malware detection is a complex problem because of the rapidly-evolving,
complicated, and diverse nature of malware. The design of malware detectors should prioritise high detection
rate, efficient use of computational resources, and sustainability. Keeping these design priorities in mind,
we develop and empirically evaluate four different classifiers. Firstly, to ensure high detection rate, we use
a dataset compiled using hybrid analysis of a diverse set of apps. Unlike most publicly-available Android
datasets, the dynamic analysis of each app was carried out on a real device and not on a virtual setup. This
means that this dataset contains a comprehensive profile of sophisticated malware capable of changing its
behaviour on a virtual setup. Secondly, to enhance efficiency, we explore the use of a GPU-based setup and
different feature selection techniques. Lastly, we emphasize sustainability by training the models using apps
that date back to the beginning of the Android ecosystem i.e. from 2008 until 2020. Our results show that
Random Forest (RF) is the most effective classifier with the highest accuracy of 97.86%. This accuracy is
2.78% higher than the best accuracy reported in existing literature. The data also shows that RF is the most
sustainable classifier with minimal decrease in F1 score for over-time performance. With regard to efficiency,
we find that Logistic Regression (LR) is the best option and that the training time of most models improves
significantly when a GPU-based setup instead of a CPU-based setup |
---|