Android app representation for machine learning based malware detection
According to statcounter, the most popular mobile operating system in the world is Android from Google with a market share of 75.66%. Since it is possible to install applications from erstwhile and not just only from the official application market ‘Google play’, malicious applications pose an invis...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/75972 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | According to statcounter, the most popular mobile operating system in the world is Android from Google with a market share of 75.66%. Since it is possible to install applications from erstwhile and not just only from the official application market ‘Google play’, malicious applications pose an invisible threat to the security of the android phones.
Traditional Android malware detection approach collects suspicious samples and analyzes each sample comparing it with the existing database. The disadvantages of low accuracy and low efficiency associated with this method made researchers and anti-virus companies to look for new techniques for better resolutions. Nowadays, machine learning and deep learning techniques are prominently being used for malware detection.
The project uses a unified framework for learning representation based malware classification. Firstly, MKLDroid is used to extract several graphs including the Control-Flow-Graphs from applications. There are five views that are integrated in this framework. In this project, only three of them are employed for analysis. Weisfeiler-Lehman graph (WL graph) kernel is applied to map the original graphs to a sequence of graphs (vectors). Graph2vec, a new method like doc2vec, is utilized to learn the embedding of the graphs in an unsupervised manner. Instead of applying kernel function, the project simply concatenates the view files. Traditional classification technique such as Support Vector Machine is then applied on the embedding to evaluate its performance. In the experimental studies, when the same dimension of 64 is used for both the embedding approach and WL graph view approach, the classifier based on the embedding can achieve 2% to 5% higher accuracy than that of usingWL graphs. In addition, classifying the embedding is faster than classifying the WL graph views. |
---|