Large scale android malware detection

Smartphones’ popularity and use has been increasing exponentially over the years. This also opens up the chance of damage to be done by malicious software or malware for short. This is especially true for Android as Android is open to installation of third party application from non-official markets...

Full description

Saved in:
Bibliographic Details
Main Author: Kasim, Arief Kresnadi Ignatius
Other Authors: Chen Lihui
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/75300
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Smartphones’ popularity and use has been increasing exponentially over the years. This also opens up the chance of damage to be done by malicious software or malware for short. This is especially true for Android as Android is open to installation of third party application from non-official markets. Like any malware, Android malware presents major security threats for android devices, and malware creators hid them in the form of applications. As the number of Android applications increase overtime, the issue of large scale android malware detection becomes even more serious. Researchers are trying to tackle this problem by using machine learning. Machine learning is capable of producing more effective approaches or analysis for large scale data. However, the challenge in identifying Android malware using machine learning has always been in representing data for analysis. Until now, there have been many proposed approaches of application data representation. Unfortunately, there has not been any technique that provides efficient vector embedding of this data for machine learning algorithm application for android malware analysis. A representation method based on graphs was devised so that the features captured from the applications would keep semantic relations, this approach was built around deep learning. This was compared with a state of the art malware detectors that were re-implemented. In this project, machine learning methods proposed in the past had been re-implemented and tested on large datasets of tens of thousands in size. Simulations had been conducted with various parameters tested and the best results were recorded.