Online Android malware family detection approaches based on progressive learning

In the past decade, smartphone is becoming an important part of people's life. Android is the most used operating system on smartphone, and users' needs such as online shopping, social connection, entertainment and mobile payment can be fulfilled by using applications on mobile operatio...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Qiming
Other Authors: Chen Lihui
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/75952
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In the past decade, smartphone is becoming an important part of people's life. Android is the most used operating system on smartphone, and users' needs such as online shopping, social connection, entertainment and mobile payment can be fulfilled by using applications on mobile operation system. In recent years, Android malware is developing rapidly and Android malware detection has be studied by researchers. Batch learning model is the basic model for most machine learning based Android malware detection researches. However, the assumption of batch learning that malicious applications do not evolve over time is not the case in real world. The model is trained on existing malware dataset, thus the performance of the model degrades as predicting forthcoming data samples. In this dissertation, first we verify the reproducibility of the existing batch learning based malware detection method, DREBIN [1] and CSBD [2], conducting multiclass classification tasks using Support Vector Machine and Random Forest on feature data extracted by DREBINN and CSBD respectively. The result shows that high accuracy and acceptable efficiency can be reproduced on different dataset. Then to enable the models to learn new classes of mal ware, we conduct the experiment of retraining annually and semi-annually. Comparing with the experiments without retraining, the retraining experiments results imply that retraining indeed enable the models to learn new families of malware streaming in. However retraining process is an incrementally costly process since the size of the training dataset increase over time. Lastly Progressive Learning [3] is applied for adjusting the models to learn new families of malware when one sample emerges. The accuracy improves significantly compared with the retraining experiments. However retraining and progressive learning processes can be time consuming since the models are adjusted as one data sample emerges.