Fine-grained commit-level vulnerability type prediction by CWE tree structure

Identifying security patches via code commits to allow early warnings and timely fixes for Open Source Software (OSS) has received increasing attention. However, the existing detection methods can only identify the presence of a patch (i.e., a binary classification) but fail to pinpoint the vulnerab...

Full description

Saved in:
Bibliographic Details
Main Authors: PAN, Shengyi, BAO, Lingfeng, XIA, Xin, LO, David, LI, Shanping
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8511
https://ink.library.smu.edu.sg/context/sis_research/article/9514/viewcontent/ICSE2023.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9514
record_format dspace
spelling sg-smu-ink.sis_research-95142024-01-22T15:10:22Z Fine-grained commit-level vulnerability type prediction by CWE tree structure PAN, Shengyi BAO, Lingfeng XIA, Xin LO, David LI, Shanping Identifying security patches via code commits to allow early warnings and timely fixes for Open Source Software (OSS) has received increasing attention. However, the existing detection methods can only identify the presence of a patch (i.e., a binary classification) but fail to pinpoint the vulnerability type. In this work, we take the first step to categorize the security patches into fine-grained vulnerability types. Specifically, we use the Common Weakness Enumeration (CWE) as the label and perform fine-grained classification using categories at the third level of the CWE tree. We first formulate the task as a Hierarchical Multi-label Classification (HMC) problem, i.e., inferring a path (a sequence of CWE nodes) from the root of the CWE tree to the node at the target depth. We then propose an approach named TreeVul with a hierarchical and chained architecture, which manages to utilize the structure information of the CWE tree as prior knowledge of the classification task. We further propose a tree structure aware and beam search based inference algorithm for retrieving the optimal path with the highest merged probability. We collect a large security patch dataset from NVD, consisting of 6,541 commits from 1,560 GitHub OSS repositories. Experimental results show that Tree-vulsignificantly outperforms the best performing baselines, with improvements of 5.9%, 25.0%, and 7.7% in terms of weighted F1-score, macro F1-score, and MCC, respectively. We further conduct a user study and a case study to verify the practical value of TreeVul in enriching the binary patch detection results and improving the data quality of NVD, respectively. 2023-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8511 info:doi/10.1109/ICSE48619.2023.00088 https://ink.library.smu.edu.sg/context/sis_research/article/9514/viewcontent/ICSE2023.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Codes Data integrity Computer architecture Inference algorithms Classification algorithms Software security Task analysis Common Weakness Enumeration Artificial Intelligence and Robotics Information Security
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Codes
Data integrity
Computer architecture
Inference algorithms
Classification algorithms
Software security
Task analysis
Common Weakness Enumeration
Artificial Intelligence and Robotics
Information Security
spellingShingle Codes
Data integrity
Computer architecture
Inference algorithms
Classification algorithms
Software security
Task analysis
Common Weakness Enumeration
Artificial Intelligence and Robotics
Information Security
PAN, Shengyi
BAO, Lingfeng
XIA, Xin
LO, David
LI, Shanping
Fine-grained commit-level vulnerability type prediction by CWE tree structure
description Identifying security patches via code commits to allow early warnings and timely fixes for Open Source Software (OSS) has received increasing attention. However, the existing detection methods can only identify the presence of a patch (i.e., a binary classification) but fail to pinpoint the vulnerability type. In this work, we take the first step to categorize the security patches into fine-grained vulnerability types. Specifically, we use the Common Weakness Enumeration (CWE) as the label and perform fine-grained classification using categories at the third level of the CWE tree. We first formulate the task as a Hierarchical Multi-label Classification (HMC) problem, i.e., inferring a path (a sequence of CWE nodes) from the root of the CWE tree to the node at the target depth. We then propose an approach named TreeVul with a hierarchical and chained architecture, which manages to utilize the structure information of the CWE tree as prior knowledge of the classification task. We further propose a tree structure aware and beam search based inference algorithm for retrieving the optimal path with the highest merged probability. We collect a large security patch dataset from NVD, consisting of 6,541 commits from 1,560 GitHub OSS repositories. Experimental results show that Tree-vulsignificantly outperforms the best performing baselines, with improvements of 5.9%, 25.0%, and 7.7% in terms of weighted F1-score, macro F1-score, and MCC, respectively. We further conduct a user study and a case study to verify the practical value of TreeVul in enriching the binary patch detection results and improving the data quality of NVD, respectively.
format text
author PAN, Shengyi
BAO, Lingfeng
XIA, Xin
LO, David
LI, Shanping
author_facet PAN, Shengyi
BAO, Lingfeng
XIA, Xin
LO, David
LI, Shanping
author_sort PAN, Shengyi
title Fine-grained commit-level vulnerability type prediction by CWE tree structure
title_short Fine-grained commit-level vulnerability type prediction by CWE tree structure
title_full Fine-grained commit-level vulnerability type prediction by CWE tree structure
title_fullStr Fine-grained commit-level vulnerability type prediction by CWE tree structure
title_full_unstemmed Fine-grained commit-level vulnerability type prediction by CWE tree structure
title_sort fine-grained commit-level vulnerability type prediction by cwe tree structure
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8511
https://ink.library.smu.edu.sg/context/sis_research/article/9514/viewcontent/ICSE2023.pdf
_version_ 1789483256312233984