Performance-oriented and sustainability-oriented design of an effective android malware detector

Effective Android malware detection is a complex problem because of the rapidly-evolving, complicated, and diverse nature of malware. The design of malware detectors should prioritise high detection rate, efficient use of computational resources, and sustainability. Keeping these design priorities...

Full description

Saved in:
Bibliographic Details
Main Authors: Qadir, Sana, Naeem, Amna, Hussain, Mehdi, Ghafoor, Huma, Hassan Abdalla Hashim, Aisha
Format: Article
Language:English
Published: IEEE Access 2024
Subjects:
Online Access:http://irep.iium.edu.my/115657/1/115657_Performance-Oriented%20and%20Sustainability-Oriented.pdf
http://irep.iium.edu.my/115657/
https://ieeexplore.ieee.org/abstract/document/10734122
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Islam Antarabangsa Malaysia
Language: English
Description
Summary:Effective Android malware detection is a complex problem because of the rapidly-evolving, complicated, and diverse nature of malware. The design of malware detectors should prioritise high detection rate, efficient use of computational resources, and sustainability. Keeping these design priorities in mind, we develop and empirically evaluate four different classifiers. Firstly, to ensure high detection rate, we use a dataset compiled using hybrid analysis of a diverse set of apps. Unlike most publicly-available Android datasets, the dynamic analysis of each app was carried out on a real device and not on a virtual setup. This means that this dataset contains a comprehensive profile of sophisticated malware capable of changing its behaviour on a virtual setup. Secondly, to enhance efficiency, we explore the use of a GPU-based setup and different feature selection techniques. Lastly, we emphasize sustainability by training the models using apps that date back to the beginning of the Android ecosystem i.e. from 2008 until 2020. Our results show that Random Forest (RF) is the most effective classifier with the highest accuracy of 97.86%. This accuracy is 2.78% higher than the best accuracy reported in existing literature. The data also shows that RF is the most sustainable classifier with minimal decrease in F1 score for over-time performance. With regard to efficiency, we find that Logistic Regression (LR) is the best option and that the training time of most models improves significantly when a GPU-based setup instead of a CPU-based setup