Finding a needle in a haystack: Automatic mining of silent vulnerability fixes

Following the coordinated vulnerability disclosure model, a vulnerability in open source software (OSS) is suggested to be fixed “silently”, without disclosing the fix until the vulnerability is disclosed. Yet, it is crucial for OSS users to be aware of vulnerability fixes as early as possible, as o...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHOU, Jiayuan, PACHECO, Michael, WAN, Zhiyuan, XIA, Xin, LO, David, WANG, Yuan, HASSAN, Ahmed E.
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/6896 https://ink.library.smu.edu.sg/context/sis_research/article/7899/viewcontent/Finding_A_Needle_in_a_Haystack.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7899
record_format	dspace
spelling	sg-smu-ink.sis_research-78992022-02-07T10:53:55Z Finding a needle in a haystack: Automatic mining of silent vulnerability fixes ZHOU, Jiayuan PACHECO, Michael WAN, Zhiyuan XIA, Xin LO, David WANG, Yuan HASSAN, Ahmed E. Following the coordinated vulnerability disclosure model, a vulnerability in open source software (OSS) is suggested to be fixed “silently”, without disclosing the fix until the vulnerability is disclosed. Yet, it is crucial for OSS users to be aware of vulnerability fixes as early as possible, as once a vulnerability fix is pushed to the source code repository, a malicious party could probe for the corresponding vulnerability to exploit it. In practice, OSS users often rely on the vulnerability disclosure information from security advisories (e.g., National Vulnerability Database) to sense vulnerability fixes. However, the time between the availability of a vulnerability fix and its disclosure can vary from days to months, and in some cases, even years. Due to manpower constraints and the lack of expert knowledge, it is infeasible for OSS users to manually analyze all code changes for vulnerability fix detection. Therefore, it is essential to identify vulnerability fixes automatically and promptly. In a first-of-its-kind study, we propose VulFixMiner, a Transformer-based approach, capable of automatically extracting semantic meaning from commit-level code changes to identify silent vulnerability fixes. We construct our model using sampled commits from 204 projects, and evaluate using the full set of commits from 52 additional projects. The evaluation results show that VulFixMiner outperforms various state-of-the-art baselines in terms of AUC (i.e., 0.81 and 0.73 on Java and Python dataset, respectively) and two effort-aware performance metrics (i.e., EffortCost, Popt). Especially, with an effort of inspecting 5% of total LOC, VulFixMiner can identify 49% of total vulnerability fixes. Additionally, with manual verification of sampled commits that were identified as vulnerability fixes, but not marked as such in our dataset, we observe that 35% (29 out of 82) of the commits are for fixing vulnerabilities, indicating VulFixMiner is also capable of identifying unreported vulnerability fixes. 2021-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6896 https://ink.library.smu.edu.sg/context/sis_research/article/7899/viewcontent/Finding_A_Needle_in_a_Haystack.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems
spellingShingle	Databases and Information Systems ZHOU, Jiayuan PACHECO, Michael WAN, Zhiyuan XIA, Xin LO, David WANG, Yuan HASSAN, Ahmed E. Finding a needle in a haystack: Automatic mining of silent vulnerability fixes
description	Following the coordinated vulnerability disclosure model, a vulnerability in open source software (OSS) is suggested to be fixed “silently”, without disclosing the fix until the vulnerability is disclosed. Yet, it is crucial for OSS users to be aware of vulnerability fixes as early as possible, as once a vulnerability fix is pushed to the source code repository, a malicious party could probe for the corresponding vulnerability to exploit it. In practice, OSS users often rely on the vulnerability disclosure information from security advisories (e.g., National Vulnerability Database) to sense vulnerability fixes. However, the time between the availability of a vulnerability fix and its disclosure can vary from days to months, and in some cases, even years. Due to manpower constraints and the lack of expert knowledge, it is infeasible for OSS users to manually analyze all code changes for vulnerability fix detection. Therefore, it is essential to identify vulnerability fixes automatically and promptly. In a first-of-its-kind study, we propose VulFixMiner, a Transformer-based approach, capable of automatically extracting semantic meaning from commit-level code changes to identify silent vulnerability fixes. We construct our model using sampled commits from 204 projects, and evaluate using the full set of commits from 52 additional projects. The evaluation results show that VulFixMiner outperforms various state-of-the-art baselines in terms of AUC (i.e., 0.81 and 0.73 on Java and Python dataset, respectively) and two effort-aware performance metrics (i.e., EffortCost, Popt). Especially, with an effort of inspecting 5% of total LOC, VulFixMiner can identify 49% of total vulnerability fixes. Additionally, with manual verification of sampled commits that were identified as vulnerability fixes, but not marked as such in our dataset, we observe that 35% (29 out of 82) of the commits are for fixing vulnerabilities, indicating VulFixMiner is also capable of identifying unreported vulnerability fixes.
format	text
author	ZHOU, Jiayuan PACHECO, Michael WAN, Zhiyuan XIA, Xin LO, David WANG, Yuan HASSAN, Ahmed E.
author_facet	ZHOU, Jiayuan PACHECO, Michael WAN, Zhiyuan XIA, Xin LO, David WANG, Yuan HASSAN, Ahmed E.
author_sort	ZHOU, Jiayuan
title	Finding a needle in a haystack: Automatic mining of silent vulnerability fixes
title_short	Finding a needle in a haystack: Automatic mining of silent vulnerability fixes
title_full	Finding a needle in a haystack: Automatic mining of silent vulnerability fixes
title_fullStr	Finding a needle in a haystack: Automatic mining of silent vulnerability fixes
title_full_unstemmed	Finding a needle in a haystack: Automatic mining of silent vulnerability fixes
title_sort	finding a needle in a haystack: automatic mining of silent vulnerability fixes
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/6896 https://ink.library.smu.edu.sg/context/sis_research/article/7899/viewcontent/Finding_A_Needle_in_a_Haystack.pdf
_version_	1770576115396509696

Finding a needle in a haystack: Automatic mining of silent vulnerability fixes

Similar Items