A machine learning approach for vulnerability curation
Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by au...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2020
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/5627 https://ink.library.smu.edu.sg/context/sis_research/article/6630/viewcontent/A_Machine_Learning_Approach_for_Vulnerability_Curation.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-6630 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-66302021-05-11T08:01:13Z A machine learning approach for vulnerability curation CHEN, Yang SANTOSA, Andrew E. ANG, Ming Yi SHARMA, Abhishek SHARMA, Asankhaya LO, David Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by automatically predicting the vulnerability-relatedness of each data item. It supports a complete pipeline from data collection, model training and prediction, to the validation of new models before deployment. It is executed iteratively to generate better models as new input data become available. We use self-training to significantly and automatically increase the size of the training dataset, opportunistically maximizing the improvement in the models' quality at each iteration. We devised new deployment stability metric to evaluate the quality of the new models before deployment into production, which helped to discover an error. We experimentally evaluate the improvement in the performance of the models in one iteration, with 27.59% maximum PR AUC improvements. Ours is the first of such study across a variety of data sources. We discover that the addition of the features of the corresponding commits to the features of issues/pull requests improve the precision for the recall values that matter. We demonstrate the effectiveness of self-training alone, with 10.50% PR AUC improvement, and we discover that there is no uniform ordering of word2vec parameters sensitivity across data sources. 2020-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5627 info:doi/10.1145/3379597.3387461 https://ink.library.smu.edu.sg/context/sis_research/article/6630/viewcontent/A_Machine_Learning_Approach_for_Vulnerability_Curation.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University application security open-source software machine learning classifiers ensemble self-training Artificial Intelligence and Robotics Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
application security open-source software machine learning classifiers ensemble self-training Artificial Intelligence and Robotics Software Engineering |
spellingShingle |
application security open-source software machine learning classifiers ensemble self-training Artificial Intelligence and Robotics Software Engineering CHEN, Yang SANTOSA, Andrew E. ANG, Ming Yi SHARMA, Abhishek SHARMA, Asankhaya LO, David A machine learning approach for vulnerability curation |
description |
Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by automatically predicting the vulnerability-relatedness of each data item. It supports a complete pipeline from data collection, model training and prediction, to the validation of new models before deployment. It is executed iteratively to generate better models as new input data become available. We use self-training to significantly and automatically increase the size of the training dataset, opportunistically maximizing the improvement in the models' quality at each iteration. We devised new deployment stability metric to evaluate the quality of the new models before deployment into production, which helped to discover an error. We experimentally evaluate the improvement in the performance of the models in one iteration, with 27.59% maximum PR AUC improvements. Ours is the first of such study across a variety of data sources. We discover that the addition of the features of the corresponding commits to the features of issues/pull requests improve the precision for the recall values that matter. We demonstrate the effectiveness of self-training alone, with 10.50% PR AUC improvement, and we discover that there is no uniform ordering of word2vec parameters sensitivity across data sources. |
format |
text |
author |
CHEN, Yang SANTOSA, Andrew E. ANG, Ming Yi SHARMA, Abhishek SHARMA, Asankhaya LO, David |
author_facet |
CHEN, Yang SANTOSA, Andrew E. ANG, Ming Yi SHARMA, Abhishek SHARMA, Asankhaya LO, David |
author_sort |
CHEN, Yang |
title |
A machine learning approach for vulnerability curation |
title_short |
A machine learning approach for vulnerability curation |
title_full |
A machine learning approach for vulnerability curation |
title_fullStr |
A machine learning approach for vulnerability curation |
title_full_unstemmed |
A machine learning approach for vulnerability curation |
title_sort |
machine learning approach for vulnerability curation |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2020 |
url |
https://ink.library.smu.edu.sg/sis_research/5627 https://ink.library.smu.edu.sg/context/sis_research/article/6630/viewcontent/A_Machine_Learning_Approach_for_Vulnerability_Curation.pdf |
_version_ |
1770575534137278464 |