Enhancing code vulnerability detection via vulnerability-preserving data augmentation

Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task for example determining whether it is...

Full description

Saved in:

Bibliographic Details
Main Authors:	LIU, Shangqing, MA, Wei, WANG, Jian, XIE, Xiaofei, FENG, Ruitao, LIU, Yang
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Graph Neural Networks Vulnerability Detection Information Security
Online Access:	https://ink.library.smu.edu.sg/sis_research/9038 https://ink.library.smu.edu.sg/context/sis_research/article/10041/viewcontent/3652032.3657564_pvoa_cc_by.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10041
record_format	dspace
spelling	sg-smu-ink.sis_research-100412024-07-25T07:56:26Z Enhancing code vulnerability detection via vulnerability-preserving data augmentation LIU, Shangqing MA, Wei WANG, Jian XIE, Xiaofei FENG, Ruitao LIU, Yang Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task for example determining whether it is vulnerable or not. This poses a challenge for a single deep-learning based model to effectively learn the wide array of vulnerability characteristics. Furthermore, due to the challenges associated with collecting large-scale vulnerability data, these detectors often overfit limited training datasets, resulting in lower model generalization performance. To address the aforementioned challenges, in this work, we introduce a fine-grained vulnerability detector namely FGVulDet. Unlike previous approaches, FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability. Each classifier is designed to learn type-specific vulnerability semantics. Additionally, to address the scarcity of data for some vulnerability types and enhance data diversity for learning better vulnerability semantics, we propose a novel vulnerability-preserving data augmentation technique to augment the number of vulnerabilities. Taking inspiration from recent advancements in graph neural networks for learning program semantics, we incorporate a Gated Graph Neural Network (GGNN) and extend it to an edge-aware GGNN to capture edge-type information. FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities. Extensive experiments compared with static-analysis-based approaches and learning-based approaches have demonstrated the effectiveness of FGVulDet. 2024-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9038 info:doi/10.1145/3652032.3657564 https://ink.library.smu.edu.sg/context/sis_research/article/10041/viewcontent/3652032.3657564_pvoa_cc_by.pdf http://creativecommons.org/licenses/by/3.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Graph Neural Networks Vulnerability Detection Information Security
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Graph Neural Networks Vulnerability Detection Information Security
spellingShingle	Graph Neural Networks Vulnerability Detection Information Security LIU, Shangqing MA, Wei WANG, Jian XIE, Xiaofei FENG, Ruitao LIU, Yang Enhancing code vulnerability detection via vulnerability-preserving data augmentation
description	Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task for example determining whether it is vulnerable or not. This poses a challenge for a single deep-learning based model to effectively learn the wide array of vulnerability characteristics. Furthermore, due to the challenges associated with collecting large-scale vulnerability data, these detectors often overfit limited training datasets, resulting in lower model generalization performance. To address the aforementioned challenges, in this work, we introduce a fine-grained vulnerability detector namely FGVulDet. Unlike previous approaches, FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability. Each classifier is designed to learn type-specific vulnerability semantics. Additionally, to address the scarcity of data for some vulnerability types and enhance data diversity for learning better vulnerability semantics, we propose a novel vulnerability-preserving data augmentation technique to augment the number of vulnerabilities. Taking inspiration from recent advancements in graph neural networks for learning program semantics, we incorporate a Gated Graph Neural Network (GGNN) and extend it to an edge-aware GGNN to capture edge-type information. FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities. Extensive experiments compared with static-analysis-based approaches and learning-based approaches have demonstrated the effectiveness of FGVulDet.
format	text
author	LIU, Shangqing MA, Wei WANG, Jian XIE, Xiaofei FENG, Ruitao LIU, Yang
author_facet	LIU, Shangqing MA, Wei WANG, Jian XIE, Xiaofei FENG, Ruitao LIU, Yang
author_sort	LIU, Shangqing
title	Enhancing code vulnerability detection via vulnerability-preserving data augmentation
title_short	Enhancing code vulnerability detection via vulnerability-preserving data augmentation
title_full	Enhancing code vulnerability detection via vulnerability-preserving data augmentation
title_fullStr	Enhancing code vulnerability detection via vulnerability-preserving data augmentation
title_full_unstemmed	Enhancing code vulnerability detection via vulnerability-preserving data augmentation
title_sort	enhancing code vulnerability detection via vulnerability-preserving data augmentation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9038 https://ink.library.smu.edu.sg/context/sis_research/article/10041/viewcontent/3652032.3657564_pvoa_cc_by.pdf
_version_	1814047714591113216

Enhancing code vulnerability detection via vulnerability-preserving data augmentation

Similar Items