Enhancing security patch identification by capturing structures in commits

With the rapid increasing number of open source software (OSS), the majority of the software vulnerabilities in the open source components are fixed silently, which leads to the deployed software that integrated them being unable to get a timely update. Hence, it is critical to design a security pat...

Full description

Saved in:
Bibliographic Details
Main Authors: WU, Bozhi, LIU, Shangqing, FENG, Ruitao, XIE, Xiaofei, SIOW, Jingkai, LIN, Shang-Wei
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7500
https://ink.library.smu.edu.sg/context/sis_research/article/8503/viewcontent/2207.09022.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:With the rapid increasing number of open source software (OSS), the majority of the software vulnerabilities in the open source components are fixed silently, which leads to the deployed software that integrated them being unable to get a timely update. Hence, it is critical to design a security patch identification system to ensure the security of the utilized software. However, most of the existing works for security patch identification just consider the changed code and the commit message of a commit as a flat sequence of tokens with simple neural networks to learn its semantics, while the structure information is ignored. To address these limitations, in this paper, we propose our well-designed approach E-SPI, which extracts the structure information hidden in a commit for effective identification. Specifically, it consists of the code change encoder to extract the syntactic of the changed code with the BiLSTM to learn the code representation and the message encoder to construct the dependency graph for the commit message with the graph neural network (GNN) to learn the message representation. We further enhance the code change encoder by embedding contextual information related to the changed code. To demonstrate the effectiveness of our approach, we conduct the extensive experiments against six state-of-the-art approaches on the existing dataset and from the real deployment environment. The experimental results confirm that our approach can significantly outperform current state-of-the-art baselines.