Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem

Open-source software (OSS) has gained significant importance in software development. In the OSS ecosystem, third-party libraries (TPLs) play a crucial role in enhancing development effciency. However, these libraries could introduce potential security risks to software projects. To mitigate and man...

Full description

Saved in:
Bibliographic Details
Main Author: Wu, Jiahui
Other Authors: Liu Yang
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182864
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-182864
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Software engineering
spellingShingle Computer and Information Science
Software engineering
Wu, Jiahui
Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem
description Open-source software (OSS) has gained significant importance in software development. In the OSS ecosystem, third-party libraries (TPLs) play a crucial role in enhancing development effciency. However, these libraries could introduce potential security risks to software projects. To mitigate and manage these risks,including Common Vulnerabilities and Exposures (CVEs), it is crucial to detectthe employed TPLs, thereby identifying and addressing the associated risks. For TPL detection, we introduced OSSFP in our first study, a precise and scalableTPL detection tool that utilizes a novel algorithm to select representative features.Given the lack of centralized management in the C/C++ community, constructing a comprehensive database is challenging, which limits the performance of thestate-of-the-art TPL detection tools. To overcome this, in our second study, we proposed OCCLASS, which bypasses data limitations by identifying the authorshipof source code files and expanding the pre-collected database. After identifying the employed TPLs, reporting the related CVEs involves mapping the TPL names onthe CVE publication website. However, the published CVEs may not pertain to all components of a TPL, leading to potential false positives. A finer granularity of TPL detection is thus required to accurately report CVEs. In our third study, we introduced B2SCOM, which precisely divides the TPL into components and maps CVEs to the corresponding components. Upon identifying CVEs, the subsequentstep is their remediation. In our fourth study, we conducted an empirical study to identify and classify remediation strategies, aiming to discover the most effective methods for various users. Software Composition Analysis (SCA) tools are indispensable for identifying TPLsin software projects and detecting associated CVEs, thus facilitating proactive vulnerability management and enhancing security. State-of-the-art SCA toolsfor detecting TPL source code typically utilize two methods: scanning bill-of-material(BOM) files and performing code clone mapping between the target projectand the TPLs in a repository. SCA tools perform well in languages such as Java and Python, which utilize centralized o”cial package managers. However, applying these algorithms to detect C/C++ TPLs proves to be less effective. The open-source ecosystem for C/C++ is more complex, not only due to the absence of a commonly used package manager but also because of code cloning practicesamong TPLs. To address this, we propose OSSFP, a novel SCA framework that effectively and effciently detects TPLs in large-scale realworld projects by generating unique fingerprints for open-source software. OSSFP significantly reduces the database size and speeds up the detection process by eliminating common and trivial functions, retaining only core functions to construct the fingerprint indexfor each TPL project. Despite the largely accuracy and effciency improvement by OSSFP, we foundthe data constraints become the bottleneck. Since there is no publicly availableC/C++ TPLs list, the accuracy of the state-of-the-art TPL detection tools limitedby their inability to eliminate noisy features effectively. To address this issue, we propose a three-fold strategy that adaptively overcomes data constraints and improves the accuracy of existing SCA tools. This approach heuristically recognizes copied code within each TPL repository without depending on a complete TPL database. Furthermore, we identify an additional 12,706 new libraries beyond the commonly used existing TPL list, further demonstrating the effectiveness of our algorithm in addressing the challenges of data constraints. Despite significant improvements in accuracy and effciency with OSSFP, data constraints have emerged as a critical bottleneck. Given the absence of a publicly available C/C++ TPL list,the accuracy of state-of-the-art TPL detection tools is hampered by the incompletedatabase and their inability to e!ectively eliminate noisy features. To overcomethis issue, we propose a three-pronged strategy that adaptively mitigates data constraints and enhances the accuracy of existing SCA tools. This strategy includes aheuristic method for identifying copied code within each TPL repository, independent of a complete TPL database. Additionally, we have identified new libraries beyond the commonly used TPL list, further underscoring the e!ectiveness of our algorithm in overcoming data constraints. Despite advances in detecting TPL source code, state-of-the-art SCA methods oftenfail to accurately report CVEs related to target binaries due to coarse granularity in detection. These binaries typically comprise only segments of the library source code, which may not correspond to all reported CVEs. To reduce CVE false positives associated with this coarse library-level mapping, transitioning to a more precise, fine-grained approach known as component mapping is crucial. Accordingly, we introduce B2SCOM, which redefines the mapping relationship between compiled binary components and their corresponding source code files, identifying specific source files for each component. Furthermore, incorporating component detection after library identification in the SCA process significantly improves the accuracy of CVE reporting. After successfully identifying vulnerabilities related to the TPLs in the targetproject, we conducted a comprehensive study on the taxonomy of vulnerability remediation countermeasures in open-source software projects. We investigated their advantages and disadvantages in the context of potential mitigation. Addressing this oversight, our study involved a comprehensive empirical analysis of a large-scale security issues from GitHub. Our goal was to understand the scope and ectiveness of remediation strategies within the OSS community. We developed a hierarchical taxonomy of remediation strategies, evaluating their ectiveness and associated costs. This study offers practical findings and insights into mitigating security issues in OSS.
author2 Liu Yang
author_facet Liu Yang
Wu, Jiahui
format Thesis-Doctor of Philosophy
author Wu, Jiahui
author_sort Wu, Jiahui
title Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem
title_short Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem
title_full Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem
title_fullStr Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem
title_full_unstemmed Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem
title_sort detection and remediation of third-party library vulnerabilities in the c/c++ open-source ecosystem
publisher Nanyang Technological University
publishDate 2025
url https://hdl.handle.net/10356/182864
_version_ 1826362283761074176
spelling sg-ntu-dr.10356-1828642025-03-05T00:29:50Z Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem Wu, Jiahui Liu Yang College of Computing and Data Science yangliu@ntu.edu.sg Computer and Information Science Software engineering Open-source software (OSS) has gained significant importance in software development. In the OSS ecosystem, third-party libraries (TPLs) play a crucial role in enhancing development effciency. However, these libraries could introduce potential security risks to software projects. To mitigate and manage these risks,including Common Vulnerabilities and Exposures (CVEs), it is crucial to detectthe employed TPLs, thereby identifying and addressing the associated risks. For TPL detection, we introduced OSSFP in our first study, a precise and scalableTPL detection tool that utilizes a novel algorithm to select representative features.Given the lack of centralized management in the C/C++ community, constructing a comprehensive database is challenging, which limits the performance of thestate-of-the-art TPL detection tools. To overcome this, in our second study, we proposed OCCLASS, which bypasses data limitations by identifying the authorshipof source code files and expanding the pre-collected database. After identifying the employed TPLs, reporting the related CVEs involves mapping the TPL names onthe CVE publication website. However, the published CVEs may not pertain to all components of a TPL, leading to potential false positives. A finer granularity of TPL detection is thus required to accurately report CVEs. In our third study, we introduced B2SCOM, which precisely divides the TPL into components and maps CVEs to the corresponding components. Upon identifying CVEs, the subsequentstep is their remediation. In our fourth study, we conducted an empirical study to identify and classify remediation strategies, aiming to discover the most effective methods for various users. Software Composition Analysis (SCA) tools are indispensable for identifying TPLsin software projects and detecting associated CVEs, thus facilitating proactive vulnerability management and enhancing security. State-of-the-art SCA toolsfor detecting TPL source code typically utilize two methods: scanning bill-of-material(BOM) files and performing code clone mapping between the target projectand the TPLs in a repository. SCA tools perform well in languages such as Java and Python, which utilize centralized o”cial package managers. However, applying these algorithms to detect C/C++ TPLs proves to be less effective. The open-source ecosystem for C/C++ is more complex, not only due to the absence of a commonly used package manager but also because of code cloning practicesamong TPLs. To address this, we propose OSSFP, a novel SCA framework that effectively and effciently detects TPLs in large-scale realworld projects by generating unique fingerprints for open-source software. OSSFP significantly reduces the database size and speeds up the detection process by eliminating common and trivial functions, retaining only core functions to construct the fingerprint indexfor each TPL project. Despite the largely accuracy and effciency improvement by OSSFP, we foundthe data constraints become the bottleneck. Since there is no publicly availableC/C++ TPLs list, the accuracy of the state-of-the-art TPL detection tools limitedby their inability to eliminate noisy features effectively. To address this issue, we propose a three-fold strategy that adaptively overcomes data constraints and improves the accuracy of existing SCA tools. This approach heuristically recognizes copied code within each TPL repository without depending on a complete TPL database. Furthermore, we identify an additional 12,706 new libraries beyond the commonly used existing TPL list, further demonstrating the effectiveness of our algorithm in addressing the challenges of data constraints. Despite significant improvements in accuracy and effciency with OSSFP, data constraints have emerged as a critical bottleneck. Given the absence of a publicly available C/C++ TPL list,the accuracy of state-of-the-art TPL detection tools is hampered by the incompletedatabase and their inability to e!ectively eliminate noisy features. To overcomethis issue, we propose a three-pronged strategy that adaptively mitigates data constraints and enhances the accuracy of existing SCA tools. This strategy includes aheuristic method for identifying copied code within each TPL repository, independent of a complete TPL database. Additionally, we have identified new libraries beyond the commonly used TPL list, further underscoring the e!ectiveness of our algorithm in overcoming data constraints. Despite advances in detecting TPL source code, state-of-the-art SCA methods oftenfail to accurately report CVEs related to target binaries due to coarse granularity in detection. These binaries typically comprise only segments of the library source code, which may not correspond to all reported CVEs. To reduce CVE false positives associated with this coarse library-level mapping, transitioning to a more precise, fine-grained approach known as component mapping is crucial. Accordingly, we introduce B2SCOM, which redefines the mapping relationship between compiled binary components and their corresponding source code files, identifying specific source files for each component. Furthermore, incorporating component detection after library identification in the SCA process significantly improves the accuracy of CVE reporting. After successfully identifying vulnerabilities related to the TPLs in the targetproject, we conducted a comprehensive study on the taxonomy of vulnerability remediation countermeasures in open-source software projects. We investigated their advantages and disadvantages in the context of potential mitigation. Addressing this oversight, our study involved a comprehensive empirical analysis of a large-scale security issues from GitHub. Our goal was to understand the scope and ectiveness of remediation strategies within the OSS community. We developed a hierarchical taxonomy of remediation strategies, evaluating their ectiveness and associated costs. This study offers practical findings and insights into mitigating security issues in OSS. Doctor of Philosophy 2025-03-05T00:29:50Z 2025-03-05T00:29:50Z 2025 Thesis-Doctor of Philosophy Wu, J. (2025). Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182864 https://hdl.handle.net/10356/182864 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University