CHRONOS: Time-aware zero-shot identification of libraries from vulnerability reports

Tools that alert developers about library vulnerabilities depend on accurate, up-to-date vulnerability databases which are maintained by security researchers. These databases record the libraries related to each vulnerability. However, the vulnerability reports may not explicitly list every library...

Full description

Saved in:
Bibliographic Details
Main Authors: LYU, Yunbo, CONG, Thanh Le, KANG, Hong Jin, WIDYASARI, Ratnadira, ZHAO, Zhipeng, LE, Xuan-Bach Dinh, LI, Ming, David LO
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8512
https://ink.library.smu.edu.sg/context/sis_research/article/9515/viewcontent/2301.03944__1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9515
record_format dspace
spelling sg-smu-ink.sis_research-95152024-01-22T15:09:59Z CHRONOS: Time-aware zero-shot identification of libraries from vulnerability reports LYU, Yunbo CONG, Thanh Le KANG, Hong Jin WIDYASARI, Ratnadira ZHAO, Zhipeng LE, Xuan-Bach Dinh LI, Ming David LO, Tools that alert developers about library vulnerabilities depend on accurate, up-to-date vulnerability databases which are maintained by security researchers. These databases record the libraries related to each vulnerability. However, the vulnerability reports may not explicitly list every library and human analysis is required to determine all the relevant libraries. Human analysis may be slow and expensive, which motivates the need for automated approaches. Researchers and practitioners have proposed to automatically identify libraries from vulnerability reports using extreme multi-label learning (XML). While state-of-the-art XML techniques showed promising performance, their experimental settings do not practically fit what happens in reality. Previous studies randomly split the vulnerability reports data for training and testing their models without considering the chronological order of the reports. This may unduly train the models on chronologically newer reports while testing the models on chronologically older ones. However, in practice, one often receives chronologically new reports, which may be related to previously unseen libraries. Under this practical setting, we observe that the performance of current XML techniques declines substantially, e.g., F1 decreased from 0.7 to 0.24 under experiments without and with consideration of chronological order of vulnerability reports. We propose a practical library identification approach, namely Chronos, based on zero-shot learning. The novelty of Chronos is three-fold. First, Chronos fits into the practical pipeline by considering the chronological order of vulnerability reports. Second, Chronos enriches the data of the vulnerability descriptions and labels using a carefully designed data enhancement step. Third, Chronos exploits the temporal ordering of the vulnerability reports using a cache to prioritize prediction of versions of libraries that recently had reports of vulnerabilities. In our experiments, Chronos achieves an average F1-score of 0.75, 3x better than the best XML-based approach. Data enhancement and the time-aware adjustment improve Chronos over the vanilla zero-shot learning model by 27% in average F1. 2023-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8512 info:doi/10.1109/ICSE48619.2023.00094 https://ink.library.smu.edu.sg/context/sis_research/article/9515/viewcontent/2301.03944__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Extreme multi-label classification Library identification Unseen labels Vulnerability reports Zero-shot learning Artificial Intelligence and Robotics Databases and Information Systems Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Extreme multi-label classification
Library identification
Unseen labels
Vulnerability reports
Zero-shot learning
Artificial Intelligence and Robotics
Databases and Information Systems
Graphics and Human Computer Interfaces
spellingShingle Extreme multi-label classification
Library identification
Unseen labels
Vulnerability reports
Zero-shot learning
Artificial Intelligence and Robotics
Databases and Information Systems
Graphics and Human Computer Interfaces
LYU, Yunbo
CONG, Thanh Le
KANG, Hong Jin
WIDYASARI, Ratnadira
ZHAO, Zhipeng
LE, Xuan-Bach Dinh
LI, Ming
David LO,
CHRONOS: Time-aware zero-shot identification of libraries from vulnerability reports
description Tools that alert developers about library vulnerabilities depend on accurate, up-to-date vulnerability databases which are maintained by security researchers. These databases record the libraries related to each vulnerability. However, the vulnerability reports may not explicitly list every library and human analysis is required to determine all the relevant libraries. Human analysis may be slow and expensive, which motivates the need for automated approaches. Researchers and practitioners have proposed to automatically identify libraries from vulnerability reports using extreme multi-label learning (XML). While state-of-the-art XML techniques showed promising performance, their experimental settings do not practically fit what happens in reality. Previous studies randomly split the vulnerability reports data for training and testing their models without considering the chronological order of the reports. This may unduly train the models on chronologically newer reports while testing the models on chronologically older ones. However, in practice, one often receives chronologically new reports, which may be related to previously unseen libraries. Under this practical setting, we observe that the performance of current XML techniques declines substantially, e.g., F1 decreased from 0.7 to 0.24 under experiments without and with consideration of chronological order of vulnerability reports. We propose a practical library identification approach, namely Chronos, based on zero-shot learning. The novelty of Chronos is three-fold. First, Chronos fits into the practical pipeline by considering the chronological order of vulnerability reports. Second, Chronos enriches the data of the vulnerability descriptions and labels using a carefully designed data enhancement step. Third, Chronos exploits the temporal ordering of the vulnerability reports using a cache to prioritize prediction of versions of libraries that recently had reports of vulnerabilities. In our experiments, Chronos achieves an average F1-score of 0.75, 3x better than the best XML-based approach. Data enhancement and the time-aware adjustment improve Chronos over the vanilla zero-shot learning model by 27% in average F1.
format text
author LYU, Yunbo
CONG, Thanh Le
KANG, Hong Jin
WIDYASARI, Ratnadira
ZHAO, Zhipeng
LE, Xuan-Bach Dinh
LI, Ming
David LO,
author_facet LYU, Yunbo
CONG, Thanh Le
KANG, Hong Jin
WIDYASARI, Ratnadira
ZHAO, Zhipeng
LE, Xuan-Bach Dinh
LI, Ming
David LO,
author_sort LYU, Yunbo
title CHRONOS: Time-aware zero-shot identification of libraries from vulnerability reports
title_short CHRONOS: Time-aware zero-shot identification of libraries from vulnerability reports
title_full CHRONOS: Time-aware zero-shot identification of libraries from vulnerability reports
title_fullStr CHRONOS: Time-aware zero-shot identification of libraries from vulnerability reports
title_full_unstemmed CHRONOS: Time-aware zero-shot identification of libraries from vulnerability reports
title_sort chronos: time-aware zero-shot identification of libraries from vulnerability reports
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8512
https://ink.library.smu.edu.sg/context/sis_research/article/9515/viewcontent/2301.03944__1_.pdf
_version_ 1789483256477908992