A machine learning approach for vulnerability curation

Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by au...

Full description

Saved in:

Bibliographic Details
Main Authors:	CHEN, Yang, SANTOSA, Andrew E., ANG, Ming Yi, SHARMA, Abhishek, SHARMA, Asankhaya, LO, David
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2020
Subjects:	application security open-source software machine learning classifiers ensemble self-training Artificial Intelligence and Robotics Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/5627 https://ink.library.smu.edu.sg/context/sis_research/article/6630/viewcontent/A_Machine_Learning_Approach_for_Vulnerability_Curation.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-6630
record_format	dspace
spelling	sg-smu-ink.sis_research-66302021-05-11T08:01:13Z A machine learning approach for vulnerability curation CHEN, Yang SANTOSA, Andrew E. ANG, Ming Yi SHARMA, Abhishek SHARMA, Asankhaya LO, David Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by automatically predicting the vulnerability-relatedness of each data item. It supports a complete pipeline from data collection, model training and prediction, to the validation of new models before deployment. It is executed iteratively to generate better models as new input data become available. We use self-training to significantly and automatically increase the size of the training dataset, opportunistically maximizing the improvement in the models' quality at each iteration. We devised new deployment stability metric to evaluate the quality of the new models before deployment into production, which helped to discover an error. We experimentally evaluate the improvement in the performance of the models in one iteration, with 27.59% maximum PR AUC improvements. Ours is the first of such study across a variety of data sources. We discover that the addition of the features of the corresponding commits to the features of issues/pull requests improve the precision for the recall values that matter. We demonstrate the effectiveness of self-training alone, with 10.50% PR AUC improvement, and we discover that there is no uniform ordering of word2vec parameters sensitivity across data sources. 2020-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5627 info:doi/10.1145/3379597.3387461 https://ink.library.smu.edu.sg/context/sis_research/article/6630/viewcontent/A_Machine_Learning_Approach_for_Vulnerability_Curation.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University application security open-source software machine learning classifiers ensemble self-training Artificial Intelligence and Robotics Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	application security open-source software machine learning classifiers ensemble self-training Artificial Intelligence and Robotics Software Engineering
spellingShingle	application security open-source software machine learning classifiers ensemble self-training Artificial Intelligence and Robotics Software Engineering CHEN, Yang SANTOSA, Andrew E. ANG, Ming Yi SHARMA, Abhishek SHARMA, Asankhaya LO, David A machine learning approach for vulnerability curation
description	Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by automatically predicting the vulnerability-relatedness of each data item. It supports a complete pipeline from data collection, model training and prediction, to the validation of new models before deployment. It is executed iteratively to generate better models as new input data become available. We use self-training to significantly and automatically increase the size of the training dataset, opportunistically maximizing the improvement in the models' quality at each iteration. We devised new deployment stability metric to evaluate the quality of the new models before deployment into production, which helped to discover an error. We experimentally evaluate the improvement in the performance of the models in one iteration, with 27.59% maximum PR AUC improvements. Ours is the first of such study across a variety of data sources. We discover that the addition of the features of the corresponding commits to the features of issues/pull requests improve the precision for the recall values that matter. We demonstrate the effectiveness of self-training alone, with 10.50% PR AUC improvement, and we discover that there is no uniform ordering of word2vec parameters sensitivity across data sources.
format	text
author	CHEN, Yang SANTOSA, Andrew E. ANG, Ming Yi SHARMA, Abhishek SHARMA, Asankhaya LO, David
author_facet	CHEN, Yang SANTOSA, Andrew E. ANG, Ming Yi SHARMA, Abhishek SHARMA, Asankhaya LO, David
author_sort	CHEN, Yang
title	A machine learning approach for vulnerability curation
title_short	A machine learning approach for vulnerability curation
title_full	A machine learning approach for vulnerability curation
title_fullStr	A machine learning approach for vulnerability curation
title_full_unstemmed	A machine learning approach for vulnerability curation
title_sort	machine learning approach for vulnerability curation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2020
url	https://ink.library.smu.edu.sg/sis_research/5627 https://ink.library.smu.edu.sg/context/sis_research/article/6630/viewcontent/A_Machine_Learning_Approach_for_Vulnerability_Curation.pdf
_version_	1770575534137278464

A machine learning approach for vulnerability curation

Similar Items