Online Android malware family detection approaches based on progressive learning
In the past decade, smartphone is becoming an important part of people's life. Android is the most used operating system on smartphone, and users' needs such as online shopping, social connection, entertainment and mobile payment can be fulfilled by using applications on mobile operatio...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/75952 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In the past decade, smartphone is becoming an important part of people's life. Android
is the most used operating system on smartphone, and users' needs such as online
shopping, social connection, entertainment and mobile payment can be fulfilled by
using applications on mobile operation system.
In recent years, Android malware is developing rapidly and Android malware
detection has be studied by researchers. Batch learning model is the basic model for
most machine learning based Android malware detection researches. However, the
assumption of batch learning that malicious applications do not evolve over time is not
the case in real world. The model is trained on existing malware dataset, thus the
performance of the model degrades as predicting forthcoming data samples.
In this dissertation, first we verify the reproducibility of the existing batch learning
based malware detection method, DREBIN [1] and CSBD [2], conducting multiclass
classification tasks using Support Vector Machine and Random Forest on feature data
extracted by DREBINN and CSBD respectively. The result shows that high accuracy
and acceptable efficiency can be reproduced on different dataset.
Then to enable the models to learn new classes of mal ware, we conduct the experiment
of retraining annually and semi-annually. Comparing with the experiments without
retraining, the retraining experiments results imply that retraining indeed enable the
models to learn new families of malware streaming in. However retraining process is
an incrementally costly process since the size of the training dataset increase over time.
Lastly Progressive Learning [3] is applied for adjusting the models to learn new
families of malware when one sample emerges. The accuracy improves significantly
compared with the retraining experiments. However retraining and progressive
learning processes can be time consuming since the models are adjusted as one data
sample emerges. |
---|