Data driven security analysis of open source software
Third-party libraries (TPLs) with rich functionalities have facilitated the fast devel- opment of modern software, leading to the explosive growth of open-source ecosys- tems and software supply chains. However, the wide reuse of TPLs as dependencies, especially those commonly used ones, also pos...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/168554 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-168554 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Liu, Chengwei Data driven security analysis of open source software |
description |
Third-party libraries (TPLs) with rich functionalities have facilitated the fast devel-
opment of modern software, leading to the explosive growth of open-source ecosys-
tems and software supply chains. However, the wide reuse of TPLs as dependencies,
especially those commonly used ones, also poses a new threat that TPLs are black
boxes to developers, and the hidden security threats could expose downstream
users to potential risks of being attacked. With the global awareness of security in
the open-source supply chain, many existing research works have been carried out
to identify, understand and mitigate these potential risks. However, most existing
works and tools are conducted at coarse-grained levels, i.e., identifying TPL de-
pendencies by reasoning the dependency networks, and detecting and remediating
vulnerabilities by their existence while neglecting their triggerability, which largely
compromises the effectiveness of existing solutions.
To fill this gap, we carry out several research works from different aspects to in-
vestigate, measure, and mitigate such potential security threats from upstream
TPLs.
First, to understand the vulnerability threats from TPL dependencies, we carry
out an empirical study to demystify the vulnerability impact and its evolution in
the NPM ecosystem, which is one of the largest ecosystems. Specifically, we first
propose and construct a complete Dependency-Vulnerability Knowledge Graph
(DVGraph) capturing the dependency relations among NPM packages and based
on it, we design a Dependency Tree Resolution Algorithm (DTResolver) to precisely
resolve dependency trees without the real installation. Based on them, we further
carry out an ecosystem-wide empirical study to retrieve insights into vulnerability
impact propagation and its evolution in the NPM ecosystem.
Next, since vulnerabilities existing in user dependencies do not indicate the user
projects are deemed to be affected by these vulnerabilities, we extend from the
package level to the code level to reduce false positives of vulnerability impact by call graphs. To fill this gap, we implement a static call graph generator (JSReach) to
check the Reachability of vulnerabilities for Node.js, it computes static call graphs
not only for Node.js projects with full dependencies but also for cases where only
dependency paths are provided, so that ecosystem-wide vulnerability impact anal-
ysis could be conducted in a more precise way. Our experiments show that JSReach
not only achieves high precision (87%) and recall (95%) when full dependencies are
available but also preserves most of the reachable functions (88%) when only de-
pendency paths are provided. Moreover, JSReach can successfully exclude 78% of
unreachable vulnerabilities with no reachable ones missed.
Third, based on DVGraph and JSReach, we carry out an ecosystem-wide study to
re-investigate the impact of vulnerabilities at the more fine-grained aspect, the
reachability of vulnerabilities. Our findings unveil the characteristics of how vul-
nerabilities propagate to threaten downstream dependents via API calls in the
NPM ecosystem. Based on them, we further propose a metric of reachability to
indicate the possibility of user projects being affected by given vulnerabilities and
implement a light-weighted tool (VREstimate) that can prioritize vulnerability re-
mediation by Estimating Vulnerability Reachability based on empirical statistics.
Our experiments validate that 90.28% of reachable vulnerabilities can be reflected
by the Reachability metric and VREstimate can successfully prioritize reachable
vulnerabilities with higher Reachability metrics, which can be further adapted to
assist traditional SCA by prioritizing vulnerability remediation.
Fourth, beyond vulnerabilities, unreliable maintenance could also result in poor
quality and security of TPLs, while such untrustworthiness, especially for critical
TPLs, could further lead to potential threats to the communities. Therefore, from
the perspective of securing the development process of critical TPLs, we first pro-
pose a systematic method for filtering out packages that are most critical in the
Maven ecosystem based on the Fused Maven Dependency Graph (MavenFG), and
next, we investigate the development process of these critical packages and con-
clude the weakness points during the maintenance of these critical packages. Based
on them, we conclude our findings and provide countermeasures that are possible
to guide further open-source governance.
In summary, these research works have unveiled the vulnerability threats in the
NPM ecosystem, at different granularity, and have proposed well-validated tools
from different aspects to mitigate such threats from different aspects. Implications
and further research directions are also expected to be explored in future works. |
author2 |
Liu Yang |
author_facet |
Liu Yang Liu, Chengwei |
format |
Thesis-Doctor of Philosophy |
author |
Liu, Chengwei |
author_sort |
Liu, Chengwei |
title |
Data driven security analysis of open source software |
title_short |
Data driven security analysis of open source software |
title_full |
Data driven security analysis of open source software |
title_fullStr |
Data driven security analysis of open source software |
title_full_unstemmed |
Data driven security analysis of open source software |
title_sort |
data driven security analysis of open source software |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/168554 |
_version_ |
1772825494090153984 |
spelling |
sg-ntu-dr.10356-1685542023-07-04T01:52:12Z Data driven security analysis of open source software Liu, Chengwei Liu Yang School of Computer Science and Engineering Cyber Security Lab yangliu@ntu.edu.sg Engineering::Computer science and engineering Third-party libraries (TPLs) with rich functionalities have facilitated the fast devel- opment of modern software, leading to the explosive growth of open-source ecosys- tems and software supply chains. However, the wide reuse of TPLs as dependencies, especially those commonly used ones, also poses a new threat that TPLs are black boxes to developers, and the hidden security threats could expose downstream users to potential risks of being attacked. With the global awareness of security in the open-source supply chain, many existing research works have been carried out to identify, understand and mitigate these potential risks. However, most existing works and tools are conducted at coarse-grained levels, i.e., identifying TPL de- pendencies by reasoning the dependency networks, and detecting and remediating vulnerabilities by their existence while neglecting their triggerability, which largely compromises the effectiveness of existing solutions. To fill this gap, we carry out several research works from different aspects to in- vestigate, measure, and mitigate such potential security threats from upstream TPLs. First, to understand the vulnerability threats from TPL dependencies, we carry out an empirical study to demystify the vulnerability impact and its evolution in the NPM ecosystem, which is one of the largest ecosystems. Specifically, we first propose and construct a complete Dependency-Vulnerability Knowledge Graph (DVGraph) capturing the dependency relations among NPM packages and based on it, we design a Dependency Tree Resolution Algorithm (DTResolver) to precisely resolve dependency trees without the real installation. Based on them, we further carry out an ecosystem-wide empirical study to retrieve insights into vulnerability impact propagation and its evolution in the NPM ecosystem. Next, since vulnerabilities existing in user dependencies do not indicate the user projects are deemed to be affected by these vulnerabilities, we extend from the package level to the code level to reduce false positives of vulnerability impact by call graphs. To fill this gap, we implement a static call graph generator (JSReach) to check the Reachability of vulnerabilities for Node.js, it computes static call graphs not only for Node.js projects with full dependencies but also for cases where only dependency paths are provided, so that ecosystem-wide vulnerability impact anal- ysis could be conducted in a more precise way. Our experiments show that JSReach not only achieves high precision (87%) and recall (95%) when full dependencies are available but also preserves most of the reachable functions (88%) when only de- pendency paths are provided. Moreover, JSReach can successfully exclude 78% of unreachable vulnerabilities with no reachable ones missed. Third, based on DVGraph and JSReach, we carry out an ecosystem-wide study to re-investigate the impact of vulnerabilities at the more fine-grained aspect, the reachability of vulnerabilities. Our findings unveil the characteristics of how vul- nerabilities propagate to threaten downstream dependents via API calls in the NPM ecosystem. Based on them, we further propose a metric of reachability to indicate the possibility of user projects being affected by given vulnerabilities and implement a light-weighted tool (VREstimate) that can prioritize vulnerability re- mediation by Estimating Vulnerability Reachability based on empirical statistics. Our experiments validate that 90.28% of reachable vulnerabilities can be reflected by the Reachability metric and VREstimate can successfully prioritize reachable vulnerabilities with higher Reachability metrics, which can be further adapted to assist traditional SCA by prioritizing vulnerability remediation. Fourth, beyond vulnerabilities, unreliable maintenance could also result in poor quality and security of TPLs, while such untrustworthiness, especially for critical TPLs, could further lead to potential threats to the communities. Therefore, from the perspective of securing the development process of critical TPLs, we first pro- pose a systematic method for filtering out packages that are most critical in the Maven ecosystem based on the Fused Maven Dependency Graph (MavenFG), and next, we investigate the development process of these critical packages and con- clude the weakness points during the maintenance of these critical packages. Based on them, we conclude our findings and provide countermeasures that are possible to guide further open-source governance. In summary, these research works have unveiled the vulnerability threats in the NPM ecosystem, at different granularity, and have proposed well-validated tools from different aspects to mitigate such threats from different aspects. Implications and further research directions are also expected to be explored in future works. Doctor of Philosophy 2023-06-07T02:53:46Z 2023-06-07T02:53:46Z 2023 Thesis-Doctor of Philosophy Liu, C. (2023). Data driven security analysis of open source software. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/168554 https://hdl.handle.net/10356/168554 10.32657/10356/168554 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |