Differential training: A generic framework to reduce label noises for Android malware detection

A common problem in machine learning-based malware detection is that training data may contain noisy labels and it is challenging to make the training data noise-free at a large scale. To address this problem, we propose a generic framework to reduce the noise level of training data for the training...

Full description

Saved in:
Bibliographic Details
Main Authors: XU, Jiayun, LI, Yingjiu, DENG, Robert H.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6551
https://ink.library.smu.edu.sg/context/sis_research/article/7554/viewcontent/Differential_training_A_generic_framework_to_reduce_label_noises_for_Android_malware_detection.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7554
record_format dspace
spelling sg-smu-ink.sis_research-75542022-01-10T03:38:49Z Differential training: A generic framework to reduce label noises for Android malware detection XU, Jiayun LI, Yingjiu DENG, Robert H. A common problem in machine learning-based malware detection is that training data may contain noisy labels and it is challenging to make the training data noise-free at a large scale. To address this problem, we propose a generic framework to reduce the noise level of training data for the training of any machine learning-based Android malware detection. Our framework makes use of all intermediate states of two identical deep learning classification models during their training with a given noisy training dataset and generate a noise-detection feature vector for each input sample. Our framework then applies a set of outlier detection algorithms on all noise-detection feature vectors to reduce the noise level of the given training data before feeding it to any machine learning based Android malware detection approach. In our experiments with threedifferent Android malware detection approaches, our framework can detect significant portions of wrong labels in different training datasets at different noise ratios, and improve the performance of Android malware detection approaches. 2021-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6551 info:doi/10.14722/ndss.2021.24126 https://ink.library.smu.edu.sg/context/sis_research/article/7554/viewcontent/Differential_training_A_generic_framework_to_reduce_label_noises_for_Android_malware_detection.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Information Security
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Information Security
spellingShingle Databases and Information Systems
Information Security
XU, Jiayun
LI, Yingjiu
DENG, Robert H.
Differential training: A generic framework to reduce label noises for Android malware detection
description A common problem in machine learning-based malware detection is that training data may contain noisy labels and it is challenging to make the training data noise-free at a large scale. To address this problem, we propose a generic framework to reduce the noise level of training data for the training of any machine learning-based Android malware detection. Our framework makes use of all intermediate states of two identical deep learning classification models during their training with a given noisy training dataset and generate a noise-detection feature vector for each input sample. Our framework then applies a set of outlier detection algorithms on all noise-detection feature vectors to reduce the noise level of the given training data before feeding it to any machine learning based Android malware detection approach. In our experiments with threedifferent Android malware detection approaches, our framework can detect significant portions of wrong labels in different training datasets at different noise ratios, and improve the performance of Android malware detection approaches.
format text
author XU, Jiayun
LI, Yingjiu
DENG, Robert H.
author_facet XU, Jiayun
LI, Yingjiu
DENG, Robert H.
author_sort XU, Jiayun
title Differential training: A generic framework to reduce label noises for Android malware detection
title_short Differential training: A generic framework to reduce label noises for Android malware detection
title_full Differential training: A generic framework to reduce label noises for Android malware detection
title_fullStr Differential training: A generic framework to reduce label noises for Android malware detection
title_full_unstemmed Differential training: A generic framework to reduce label noises for Android malware detection
title_sort differential training: a generic framework to reduce label noises for android malware detection
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/6551
https://ink.library.smu.edu.sg/context/sis_research/article/7554/viewcontent/Differential_training_A_generic_framework_to_reduce_label_noises_for_Android_malware_detection.pdf
_version_ 1770575986270666752