Experimental comparison of features, analyses, and classifiers for Android malware detection

Android malware detection has been an active area of research. In the past decade, several machine learning-based approaches based on different types of features that may characterize Android malware behaviors have been proposed. The usually-analyzed features include API usages and sequences at vari...

Full description

Saved in:
Bibliographic Details
Main Authors: SHAR, Lwin Khin, DEMISSIE, Biniam Fisseha, CECCATO, Mariano, YAN, Naing Tun, LO, David, JIANG, Lingxiao, BIENERT, Christoph
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8211
https://ink.library.smu.edu.sg/context/sis_research/article/9214/viewcontent/Empirical_Comparison_Malware_EMSE23_Jnl_Paper.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9214
record_format dspace
spelling sg-smu-ink.sis_research-92142023-10-13T09:22:57Z Experimental comparison of features, analyses, and classifiers for Android malware detection SHAR, Lwin Khin DEMISSIE, Biniam Fisseha CECCATO, Mariano YAN, Naing Tun LO, David JIANG, Lingxiao BIENERT, Christoph Android malware detection has been an active area of research. In the past decade, several machine learning-based approaches based on different types of features that may characterize Android malware behaviors have been proposed. The usually-analyzed features include API usages and sequences at various abstraction levels (e.g., class and package), extracted using static or dynamic analysis. Additionally, features that characterize permission uses, native API calls and reflection have also been analyzed. Initial works used conventional classifiers such as Random Forest to learn on those features. In recent years, deep learning-based classifiers such as Recurrent Neural Network have been explored. Considering various types of features, analyses, and classifiers proposed in literature, there is a need of comprehensive evaluation on performances of current state-of-the-art Android malware classification based on a common benchmark. In this study, we evaluate the performance of different types of features and the performance between a conventional classifier, Random Forest (RF) and a deep learning classifier, Recurrent Neural Network (RNN). To avoid temporal and spatial biases, we evaluate the performances in a time- and space-aware setting in which classifiers are trained with older apps and tested on newer apps, and the distribution of test samples is representative of in-the-wild malware-to-benign ratio. Features are extracted from a common benchmark of 7,860 benign samples and 5,912 malware, whose release years span from 2010 to 2020. Among other findings, our study shows that permission use features perform the best among the features we investigated; package-level features generally perform better than class-level features; static features generally perform better than dynamic features; and RNN classifier performs better than RF classifier when trained on sequence-type features. 2023-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8211 info:doi/10.1007/s10664-023-10375-y https://ink.library.smu.edu.sg/context/sis_research/article/9214/viewcontent/Empirical_Comparison_Malware_EMSE23_Jnl_Paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University malware detection machine learning deep learning android Information Security Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic malware detection
machine learning
deep learning
android
Information Security
Software Engineering
spellingShingle malware detection
machine learning
deep learning
android
Information Security
Software Engineering
SHAR, Lwin Khin
DEMISSIE, Biniam Fisseha
CECCATO, Mariano
YAN, Naing Tun
LO, David
JIANG, Lingxiao
BIENERT, Christoph
Experimental comparison of features, analyses, and classifiers for Android malware detection
description Android malware detection has been an active area of research. In the past decade, several machine learning-based approaches based on different types of features that may characterize Android malware behaviors have been proposed. The usually-analyzed features include API usages and sequences at various abstraction levels (e.g., class and package), extracted using static or dynamic analysis. Additionally, features that characterize permission uses, native API calls and reflection have also been analyzed. Initial works used conventional classifiers such as Random Forest to learn on those features. In recent years, deep learning-based classifiers such as Recurrent Neural Network have been explored. Considering various types of features, analyses, and classifiers proposed in literature, there is a need of comprehensive evaluation on performances of current state-of-the-art Android malware classification based on a common benchmark. In this study, we evaluate the performance of different types of features and the performance between a conventional classifier, Random Forest (RF) and a deep learning classifier, Recurrent Neural Network (RNN). To avoid temporal and spatial biases, we evaluate the performances in a time- and space-aware setting in which classifiers are trained with older apps and tested on newer apps, and the distribution of test samples is representative of in-the-wild malware-to-benign ratio. Features are extracted from a common benchmark of 7,860 benign samples and 5,912 malware, whose release years span from 2010 to 2020. Among other findings, our study shows that permission use features perform the best among the features we investigated; package-level features generally perform better than class-level features; static features generally perform better than dynamic features; and RNN classifier performs better than RF classifier when trained on sequence-type features.
format text
author SHAR, Lwin Khin
DEMISSIE, Biniam Fisseha
CECCATO, Mariano
YAN, Naing Tun
LO, David
JIANG, Lingxiao
BIENERT, Christoph
author_facet SHAR, Lwin Khin
DEMISSIE, Biniam Fisseha
CECCATO, Mariano
YAN, Naing Tun
LO, David
JIANG, Lingxiao
BIENERT, Christoph
author_sort SHAR, Lwin Khin
title Experimental comparison of features, analyses, and classifiers for Android malware detection
title_short Experimental comparison of features, analyses, and classifiers for Android malware detection
title_full Experimental comparison of features, analyses, and classifiers for Android malware detection
title_fullStr Experimental comparison of features, analyses, and classifiers for Android malware detection
title_full_unstemmed Experimental comparison of features, analyses, and classifiers for Android malware detection
title_sort experimental comparison of features, analyses, and classifiers for android malware detection
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8211
https://ink.library.smu.edu.sg/context/sis_research/article/9214/viewcontent/Empirical_Comparison_Malware_EMSE23_Jnl_Paper.pdf
_version_ 1781793934436990976