A machine learning approach for vulnerability curation

Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by au...

Full description

Saved in:
Bibliographic Details
Main Authors: CHEN, Yang, SANTOSA, Andrew E., ANG, Ming Yi, SHARMA, Abhishek, SHARMA, Asankhaya, LO, David
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2020
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/5627
https://ink.library.smu.edu.sg/context/sis_research/article/6630/viewcontent/A_Machine_Learning_Approach_for_Vulnerability_Curation.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-6630
record_format dspace
spelling sg-smu-ink.sis_research-66302021-05-11T08:01:13Z A machine learning approach for vulnerability curation CHEN, Yang SANTOSA, Andrew E. ANG, Ming Yi SHARMA, Abhishek SHARMA, Asankhaya LO, David Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by automatically predicting the vulnerability-relatedness of each data item. It supports a complete pipeline from data collection, model training and prediction, to the validation of new models before deployment. It is executed iteratively to generate better models as new input data become available. We use self-training to significantly and automatically increase the size of the training dataset, opportunistically maximizing the improvement in the models' quality at each iteration. We devised new deployment stability metric to evaluate the quality of the new models before deployment into production, which helped to discover an error. We experimentally evaluate the improvement in the performance of the models in one iteration, with 27.59% maximum PR AUC improvements. Ours is the first of such study across a variety of data sources. We discover that the addition of the features of the corresponding commits to the features of issues/pull requests improve the precision for the recall values that matter. We demonstrate the effectiveness of self-training alone, with 10.50% PR AUC improvement, and we discover that there is no uniform ordering of word2vec parameters sensitivity across data sources. 2020-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5627 info:doi/10.1145/3379597.3387461 https://ink.library.smu.edu.sg/context/sis_research/article/6630/viewcontent/A_Machine_Learning_Approach_for_Vulnerability_Curation.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University application security open-source software machine learning classifiers ensemble self-training Artificial Intelligence and Robotics Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic application security
open-source software
machine learning
classifiers ensemble
self-training
Artificial Intelligence and Robotics
Software Engineering
spellingShingle application security
open-source software
machine learning
classifiers ensemble
self-training
Artificial Intelligence and Robotics
Software Engineering
CHEN, Yang
SANTOSA, Andrew E.
ANG, Ming Yi
SHARMA, Abhishek
SHARMA, Asankhaya
LO, David
A machine learning approach for vulnerability curation
description Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by automatically predicting the vulnerability-relatedness of each data item. It supports a complete pipeline from data collection, model training and prediction, to the validation of new models before deployment. It is executed iteratively to generate better models as new input data become available. We use self-training to significantly and automatically increase the size of the training dataset, opportunistically maximizing the improvement in the models' quality at each iteration. We devised new deployment stability metric to evaluate the quality of the new models before deployment into production, which helped to discover an error. We experimentally evaluate the improvement in the performance of the models in one iteration, with 27.59% maximum PR AUC improvements. Ours is the first of such study across a variety of data sources. We discover that the addition of the features of the corresponding commits to the features of issues/pull requests improve the precision for the recall values that matter. We demonstrate the effectiveness of self-training alone, with 10.50% PR AUC improvement, and we discover that there is no uniform ordering of word2vec parameters sensitivity across data sources.
format text
author CHEN, Yang
SANTOSA, Andrew E.
ANG, Ming Yi
SHARMA, Abhishek
SHARMA, Asankhaya
LO, David
author_facet CHEN, Yang
SANTOSA, Andrew E.
ANG, Ming Yi
SHARMA, Abhishek
SHARMA, Asankhaya
LO, David
author_sort CHEN, Yang
title A machine learning approach for vulnerability curation
title_short A machine learning approach for vulnerability curation
title_full A machine learning approach for vulnerability curation
title_fullStr A machine learning approach for vulnerability curation
title_full_unstemmed A machine learning approach for vulnerability curation
title_sort machine learning approach for vulnerability curation
publisher Institutional Knowledge at Singapore Management University
publishDate 2020
url https://ink.library.smu.edu.sg/sis_research/5627
https://ink.library.smu.edu.sg/context/sis_research/article/6630/viewcontent/A_Machine_Learning_Approach_for_Vulnerability_Curation.pdf
_version_ 1770575534137278464