Android malware detection through online learning
Due to the advantages of free, open source, and portability, the Android system has developed rapidly in the mobile market and has become the operating system with the highest usage rate. At the same time, however, an endless stream of malicious software has been developed. Therefore, how to detect...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/75961 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Due to the advantages of free, open source, and portability, the Android system has developed rapidly in the mobile market and has become the operating system with the highest usage rate. At the same time, however, an endless stream of malicious software has been developed. Therefore, how to detect malicious software accurately and efficiently has become an important research topic in this field.
In recent years, some Machine Learning based Android malware detection approaches have been proposed and achieved very good results. Most of these methods, however, employ batch learning models. Since concept drift of Android applications happens over times, the performance of these models will degrade as new applications being developed.
In order to have deep insight of exist malware detection approaches, I studied two state-of-art approaches (DREBIN and CSBD) and reimplemented them in a series experiments to investigate the performance response to application and concept drift. The result showed that this problem has significant impact on the performance of the two approaches. To this end, I conducted retraining experiments and replaced batch learning models with online learning models that improved the robustness of models against concept drift to some extent. More specifically, the online learning algorithm achieved significant improvement in accuracy that retained cumulative error rate at an extremely low level of 1% during the testing over the whole dataset of more than 80,000 Android applications. Furthermore, online learning enables us to modify classifiers without retraining, that makes the training and testing process much less time-consuming. Experimental study demonstrated that CSBD uses only 1/20 time for classifier adaptation with the online approach, without compromising the performance accuracy. |
---|