Machine learning based Android malware detection

As the Android operating system continues to thrive on the mobile platform, it also spawned a large amount of malicious software, leaving its users to grave security threat. How to effectively detect malicious software has therefore been the topical research. The static detection method once used de...

Full description

Saved in:
Bibliographic Details
Main Author: Huang, Hanlin
Other Authors: Chen Lihui
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/76375
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:As the Android operating system continues to thrive on the mobile platform, it also spawned a large amount of malicious software, leaving its users to grave security threat. How to effectively detect malicious software has therefore been the topical research. The static detection method once used depends heavily on the analysis and comparison for source codes of Android applications. Yet in the face of various malicious software with fast speed in development, such a method has many limitations. Considering those issues mentioned above, the important points in the report of this project include the following: (1) Feature extraction is implemented and used for classification/prediction: Based on traditional machine learning malware detection method, multiple feature sets extracted through open-source datasets need to be reduced but used efficiently, which can further improve the generalization capacity of training models as well as enjoy high accuracy of classification and prediction of malware, proved by experiment. (2) Graph embedding for Android applications is implemented and used for malware prediction. Each graph refers to the API Dependence Graphs (ADGs) of each of the applications. Such a technology is inspired by word embedding and document embedding that use deep learning. In this report, experimental study shows that the accuracy of classification/prediction is enhanced by training backend classifiers with the results of graph embedding.