From market to device : adaptive and efficient malware detection for Android
In the past few years, the market share ratio of Android System has been increased to a leading position. With that large user basis, the number of Android applications on Google Play has increased to 3 million till the year of 2018. However, not all of the applications in market can be surely preve...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/145982 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In the past few years, the market share ratio of Android System has been increased to a leading position. With that large user basis, the number of Android applications on Google Play has increased to 3 million till the year of 2018. However, not all of the applications in market can be surely prevented from security risks. API misuse and incorrect invocation by developers may cause significant data leakage or tangibly degrade user experience, etc. Meanwhile, due to the complexity of Android system and diversity of real usage scenarios, it is quite a challenge to solve all these problems within a strait forward way. Thus, we set our targets on providing solutions for the Android security problems towards different usage scenarios separately.
As we know, a precise representation for attacks can benefit the detection of malware in both accuracy and efficiency. However, it is still far from expectation to describe attacks precisely on the Android platform. In addition, new features on Android, such as communication mechanisms, introduce new challenges and difficulties for attack detection. Considering to solve the addressed problems by the side of service provider and security researcher, we propose abstract attack models to precisely capture the semantics of various Android attacks, which include the corresponding targets, involved behaviors as well as their execution dependency. Meanwhile, we construct a novel graph-based model called ICCG (Inter-component Communication Graph) to describe the internal control flows and inter-component communications of applications. The models take into account more communication channel with a maximized preservation of their program logics. With the guidance of the attack models, we propose a static searching approach to detect attacks hidden in ICCG. To reduce false positive rate, we introduce an additional dynamic confirmation step to check whether the detected attacks are false alarms. Experiments show that our integrated malware detection system, DroidEcho, can detect attacks in both benchmark and real-world applications effectively and efficiently with a precision of 89.5%.
However, apart from the applications provided by the official market (i.e., Google Play Store), which can adopt a heavy and complicated detection approach (e.g., DroidEcho), apps from unofficial markets and third-party resources are always causing serious security threats to end-users. Meanwhile, it is a time-consuming task if the app is downloaded first and then uploaded to the server side for detection, because the network transmission has a lot of overhead. In addition, the uploading process also suffers from the threat of attackers. Consequently, a last line of defense on mobile devices is necessary and much-needed.
To address this problem, we propose an effective Android malware detection system, MobiTive, leveraging customized deep neural networks to provide a real-time and responsive detection environment on mobile devices. MobiTive is a pre-installed solution rather than an app scanning and monitoring engine using after installation, which is more practical and secure. Although a deep learning-based approach can be maintained on server side efficiently for malware detection, original deep learning models cannot be directly deployed and executed on mobile devices due to various performance limitations, such as computation power, memory size, and energy. Therefore, we evaluate and investigate the following key points: (1) the performance of different feature extraction methods based on source code or binary code; (2) the performance of different feature type selections for deep learning on mobile devices; (3) the detection accuracy of different deep neural networks on mobile devices; (4) the real-time detection performance and accuracy on different mobile devices; (5) the potential based on the evolution trend of mobile devices' specifications; and finally we further propose a practical solution (MobiTive) to detect Android malware on mobile devices.
Based on the evaluations and findings on MobiTive, we find that syntax features, such as permissions and API calls, lack the semantics which can represent the potential malicious behaviors and further result in more robust model with high accuracy for malware detection. We further propose an efficient Android malware detection system, named SeqMobile, which adopts behavior-based sequence features and leverages customized deep neural networks on mobile devices instead of the server end. Different from the traditional sequence-based approaches on server end, to meet the performance demand on mobile devices, SeqMobile accepts three effective performance optimization methods to reduce the time of feature extraction and prediction. To evaluate the effectiveness and efficiency of our system, we conduct experiments from the following aspects 1) the detection accuracy of different recurrent neural networks (RNN); 2) the feature extraction performance on different mobile devices, and 3) the detection accuracy and prediction time cost of different sequence lengths. The results unveil that SeqMobile can effectively detect malware with high accuracy. Moreover, our performance optimization methods have proven to improve the performance of training and prediction by at least twofold. Additionally, to discover the potential performance optimization from the state-of-the-art TensorFlow model optimization toolkit for our sequence-based approach, we also provide an evaluation on the toolkit, which can serve as a guidance for other systems leveraging on sequence-based learning approach. Overall, we conclude that our sequence-based approach, together with our performance optimization methods, enable us to efficiently detect malware under the performance demands of mobile devices. |
---|