Android app representation for machine learning based malware detection

According to statcounter, the most popular mobile operating system in the world is Android from Google with a market share of 75.66%. Since it is possible to install applications from erstwhile and not just only from the official application market ‘Google play’, malicious applications pose an invis...

Full description

Saved in:
Bibliographic Details
Main Author: Zheng, Dunyuan
Other Authors: Chen Lihui
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/75972
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-75972
record_format dspace
spelling sg-ntu-dr.10356-759722023-07-04T15:55:48Z Android app representation for machine learning based malware detection Zheng, Dunyuan Chen Lihui School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering According to statcounter, the most popular mobile operating system in the world is Android from Google with a market share of 75.66%. Since it is possible to install applications from erstwhile and not just only from the official application market ‘Google play’, malicious applications pose an invisible threat to the security of the android phones. Traditional Android malware detection approach collects suspicious samples and analyzes each sample comparing it with the existing database. The disadvantages of low accuracy and low efficiency associated with this method made researchers and anti-virus companies to look for new techniques for better resolutions. Nowadays, machine learning and deep learning techniques are prominently being used for malware detection. The project uses a unified framework for learning representation based malware classification. Firstly, MKLDroid is used to extract several graphs including the Control-Flow-Graphs from applications. There are five views that are integrated in this framework. In this project, only three of them are employed for analysis. Weisfeiler-Lehman graph (WL graph) kernel is applied to map the original graphs to a sequence of graphs (vectors). Graph2vec, a new method like doc2vec, is utilized to learn the embedding of the graphs in an unsupervised manner. Instead of applying kernel function, the project simply concatenates the view files. Traditional classification technique such as Support Vector Machine is then applied on the embedding to evaluate its performance. In the experimental studies, when the same dimension of 64 is used for both the embedding approach and WL graph view approach, the classifier based on the embedding can achieve 2% to 5% higher accuracy than that of usingWL graphs. In addition, classifying the embedding is faster than classifying the WL graph views. Master of Science (Signal Processing) 2018-09-11T05:18:17Z 2018-09-11T05:18:17Z 2018 Thesis http://hdl.handle.net/10356/75972 en 58 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Zheng, Dunyuan
Android app representation for machine learning based malware detection
description According to statcounter, the most popular mobile operating system in the world is Android from Google with a market share of 75.66%. Since it is possible to install applications from erstwhile and not just only from the official application market ‘Google play’, malicious applications pose an invisible threat to the security of the android phones. Traditional Android malware detection approach collects suspicious samples and analyzes each sample comparing it with the existing database. The disadvantages of low accuracy and low efficiency associated with this method made researchers and anti-virus companies to look for new techniques for better resolutions. Nowadays, machine learning and deep learning techniques are prominently being used for malware detection. The project uses a unified framework for learning representation based malware classification. Firstly, MKLDroid is used to extract several graphs including the Control-Flow-Graphs from applications. There are five views that are integrated in this framework. In this project, only three of them are employed for analysis. Weisfeiler-Lehman graph (WL graph) kernel is applied to map the original graphs to a sequence of graphs (vectors). Graph2vec, a new method like doc2vec, is utilized to learn the embedding of the graphs in an unsupervised manner. Instead of applying kernel function, the project simply concatenates the view files. Traditional classification technique such as Support Vector Machine is then applied on the embedding to evaluate its performance. In the experimental studies, when the same dimension of 64 is used for both the embedding approach and WL graph view approach, the classifier based on the embedding can achieve 2% to 5% higher accuracy than that of usingWL graphs. In addition, classifying the embedding is faster than classifying the WL graph views.
author2 Chen Lihui
author_facet Chen Lihui
Zheng, Dunyuan
format Theses and Dissertations
author Zheng, Dunyuan
author_sort Zheng, Dunyuan
title Android app representation for machine learning based malware detection
title_short Android app representation for machine learning based malware detection
title_full Android app representation for machine learning based malware detection
title_fullStr Android app representation for machine learning based malware detection
title_full_unstemmed Android app representation for machine learning based malware detection
title_sort android app representation for machine learning based malware detection
publishDate 2018
url http://hdl.handle.net/10356/75972
_version_ 1772827705502334976