Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem
Open-source software (OSS) has gained significant importance in software development. In the OSS ecosystem, third-party libraries (TPLs) play a crucial role in enhancing development effciency. However, these libraries could introduce potential security risks to software projects. To mitigate and man...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182864 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-182864 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Software engineering |
spellingShingle |
Computer and Information Science Software engineering Wu, Jiahui Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem |
description |
Open-source software (OSS) has gained significant importance in software development. In the OSS ecosystem, third-party libraries (TPLs) play a crucial role in enhancing development effciency. However, these libraries could introduce potential security risks to software projects. To mitigate and manage these risks,including Common Vulnerabilities and Exposures (CVEs), it is crucial to detectthe employed TPLs, thereby identifying and addressing the associated risks. For TPL detection, we introduced OSSFP in our first study, a precise and scalableTPL detection tool that utilizes a novel algorithm to select representative features.Given the lack of centralized management in the C/C++ community, constructing a comprehensive database is challenging, which limits the performance of thestate-of-the-art TPL detection tools. To overcome this, in our second study, we proposed OCCLASS, which bypasses data limitations by identifying the authorshipof source code files and expanding the pre-collected database. After identifying the employed TPLs, reporting the related CVEs involves mapping the TPL names onthe CVE publication website. However, the published CVEs may not pertain to all components of a TPL, leading to potential false positives. A finer granularity of TPL detection is thus required to accurately report CVEs. In our third study, we introduced B2SCOM, which precisely divides the TPL into components and maps CVEs to the corresponding components. Upon identifying CVEs, the subsequentstep is their remediation. In our fourth study, we conducted an empirical study to identify and classify remediation strategies, aiming to discover the most effective methods for various users.
Software Composition Analysis (SCA) tools are indispensable for identifying TPLsin software projects and detecting associated CVEs, thus facilitating proactive vulnerability management and enhancing security. State-of-the-art SCA toolsfor detecting TPL source code typically utilize two methods: scanning bill-of-material(BOM) files and performing code clone mapping between the target projectand the TPLs in a repository. SCA tools perform well in languages such as Java and Python, which utilize centralized o”cial package managers. However, applying these algorithms to detect C/C++ TPLs proves to be less effective. The open-source ecosystem for C/C++ is more complex, not only due to the absence of a commonly used package manager but also because of code cloning practicesamong TPLs. To address this, we propose OSSFP, a novel SCA framework that effectively and effciently detects TPLs in large-scale realworld projects by generating unique fingerprints for open-source software. OSSFP significantly reduces the database size and speeds up the detection process by eliminating common and trivial functions, retaining only core functions to construct the fingerprint indexfor each TPL project.
Despite the largely accuracy and effciency improvement by OSSFP, we foundthe data constraints become the bottleneck. Since there is no publicly availableC/C++ TPLs list, the accuracy of the state-of-the-art TPL detection tools limitedby their inability to eliminate noisy features effectively. To address this issue, we propose a three-fold strategy that adaptively overcomes data constraints and improves the accuracy of existing SCA tools. This approach heuristically recognizes copied code within each TPL repository without depending on a complete TPL database. Furthermore, we identify an additional 12,706 new libraries beyond the commonly used existing TPL list, further demonstrating the effectiveness of our algorithm in addressing the challenges of data constraints. Despite significant improvements in accuracy and effciency with OSSFP, data constraints have emerged as a critical bottleneck. Given the absence of a publicly available C/C++ TPL list,the accuracy of state-of-the-art TPL detection tools is hampered by the incompletedatabase and their inability to e!ectively eliminate noisy features. To overcomethis issue, we propose a three-pronged strategy that adaptively mitigates data constraints and enhances the accuracy of existing SCA tools. This strategy includes aheuristic method for identifying copied code within each TPL repository, independent of a complete TPL database. Additionally, we have identified new libraries beyond the commonly used TPL list, further underscoring the e!ectiveness of our algorithm in overcoming data constraints.
Despite advances in detecting TPL source code, state-of-the-art SCA methods oftenfail to accurately report CVEs related to target binaries due to coarse granularity in detection. These binaries typically comprise only segments of the library source code, which may not correspond to all reported CVEs. To reduce CVE false positives associated with this coarse library-level mapping, transitioning to a more precise, fine-grained approach known as component mapping is crucial. Accordingly, we introduce B2SCOM, which redefines the mapping relationship between compiled binary components and their corresponding source code files, identifying specific source files for each component. Furthermore, incorporating component detection after library identification in the SCA process significantly improves the accuracy of CVE reporting.
After successfully identifying vulnerabilities related to the TPLs in the targetproject, we conducted a comprehensive study on the taxonomy of vulnerability remediation countermeasures in open-source software projects. We investigated their advantages and disadvantages in the context of potential mitigation. Addressing this oversight, our study involved a comprehensive empirical analysis of a large-scale security issues from GitHub. Our goal was to understand the scope and ectiveness of remediation strategies within the OSS community. We developed a hierarchical taxonomy of remediation strategies, evaluating their ectiveness and associated costs. This study offers practical findings and insights into mitigating security issues in OSS. |
author2 |
Liu Yang |
author_facet |
Liu Yang Wu, Jiahui |
format |
Thesis-Doctor of Philosophy |
author |
Wu, Jiahui |
author_sort |
Wu, Jiahui |
title |
Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem |
title_short |
Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem |
title_full |
Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem |
title_fullStr |
Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem |
title_full_unstemmed |
Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem |
title_sort |
detection and remediation of third-party library vulnerabilities in the c/c++ open-source ecosystem |
publisher |
Nanyang Technological University |
publishDate |
2025 |
url |
https://hdl.handle.net/10356/182864 |
_version_ |
1826362283761074176 |
spelling |
sg-ntu-dr.10356-1828642025-03-05T00:29:50Z Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem Wu, Jiahui Liu Yang College of Computing and Data Science yangliu@ntu.edu.sg Computer and Information Science Software engineering Open-source software (OSS) has gained significant importance in software development. In the OSS ecosystem, third-party libraries (TPLs) play a crucial role in enhancing development effciency. However, these libraries could introduce potential security risks to software projects. To mitigate and manage these risks,including Common Vulnerabilities and Exposures (CVEs), it is crucial to detectthe employed TPLs, thereby identifying and addressing the associated risks. For TPL detection, we introduced OSSFP in our first study, a precise and scalableTPL detection tool that utilizes a novel algorithm to select representative features.Given the lack of centralized management in the C/C++ community, constructing a comprehensive database is challenging, which limits the performance of thestate-of-the-art TPL detection tools. To overcome this, in our second study, we proposed OCCLASS, which bypasses data limitations by identifying the authorshipof source code files and expanding the pre-collected database. After identifying the employed TPLs, reporting the related CVEs involves mapping the TPL names onthe CVE publication website. However, the published CVEs may not pertain to all components of a TPL, leading to potential false positives. A finer granularity of TPL detection is thus required to accurately report CVEs. In our third study, we introduced B2SCOM, which precisely divides the TPL into components and maps CVEs to the corresponding components. Upon identifying CVEs, the subsequentstep is their remediation. In our fourth study, we conducted an empirical study to identify and classify remediation strategies, aiming to discover the most effective methods for various users. Software Composition Analysis (SCA) tools are indispensable for identifying TPLsin software projects and detecting associated CVEs, thus facilitating proactive vulnerability management and enhancing security. State-of-the-art SCA toolsfor detecting TPL source code typically utilize two methods: scanning bill-of-material(BOM) files and performing code clone mapping between the target projectand the TPLs in a repository. SCA tools perform well in languages such as Java and Python, which utilize centralized o”cial package managers. However, applying these algorithms to detect C/C++ TPLs proves to be less effective. The open-source ecosystem for C/C++ is more complex, not only due to the absence of a commonly used package manager but also because of code cloning practicesamong TPLs. To address this, we propose OSSFP, a novel SCA framework that effectively and effciently detects TPLs in large-scale realworld projects by generating unique fingerprints for open-source software. OSSFP significantly reduces the database size and speeds up the detection process by eliminating common and trivial functions, retaining only core functions to construct the fingerprint indexfor each TPL project. Despite the largely accuracy and effciency improvement by OSSFP, we foundthe data constraints become the bottleneck. Since there is no publicly availableC/C++ TPLs list, the accuracy of the state-of-the-art TPL detection tools limitedby their inability to eliminate noisy features effectively. To address this issue, we propose a three-fold strategy that adaptively overcomes data constraints and improves the accuracy of existing SCA tools. This approach heuristically recognizes copied code within each TPL repository without depending on a complete TPL database. Furthermore, we identify an additional 12,706 new libraries beyond the commonly used existing TPL list, further demonstrating the effectiveness of our algorithm in addressing the challenges of data constraints. Despite significant improvements in accuracy and effciency with OSSFP, data constraints have emerged as a critical bottleneck. Given the absence of a publicly available C/C++ TPL list,the accuracy of state-of-the-art TPL detection tools is hampered by the incompletedatabase and their inability to e!ectively eliminate noisy features. To overcomethis issue, we propose a three-pronged strategy that adaptively mitigates data constraints and enhances the accuracy of existing SCA tools. This strategy includes aheuristic method for identifying copied code within each TPL repository, independent of a complete TPL database. Additionally, we have identified new libraries beyond the commonly used TPL list, further underscoring the e!ectiveness of our algorithm in overcoming data constraints. Despite advances in detecting TPL source code, state-of-the-art SCA methods oftenfail to accurately report CVEs related to target binaries due to coarse granularity in detection. These binaries typically comprise only segments of the library source code, which may not correspond to all reported CVEs. To reduce CVE false positives associated with this coarse library-level mapping, transitioning to a more precise, fine-grained approach known as component mapping is crucial. Accordingly, we introduce B2SCOM, which redefines the mapping relationship between compiled binary components and their corresponding source code files, identifying specific source files for each component. Furthermore, incorporating component detection after library identification in the SCA process significantly improves the accuracy of CVE reporting. After successfully identifying vulnerabilities related to the TPLs in the targetproject, we conducted a comprehensive study on the taxonomy of vulnerability remediation countermeasures in open-source software projects. We investigated their advantages and disadvantages in the context of potential mitigation. Addressing this oversight, our study involved a comprehensive empirical analysis of a large-scale security issues from GitHub. Our goal was to understand the scope and ectiveness of remediation strategies within the OSS community. We developed a hierarchical taxonomy of remediation strategies, evaluating their ectiveness and associated costs. This study offers practical findings and insights into mitigating security issues in OSS. Doctor of Philosophy 2025-03-05T00:29:50Z 2025-03-05T00:29:50Z 2025 Thesis-Doctor of Philosophy Wu, J. (2025). Detection and remediation of third-party library vulnerabilities in the C/C++ open-source ecosystem. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182864 https://hdl.handle.net/10356/182864 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |