Semantic driven vulnerability detection and patch analysis

Software vulnerability has become a major threat to software security. Works have been proposed to search for vulnerabilities in both source code and binary programs. Code clone detection is one of the effective approaches to identify 1-day vulnerabilities, which detects similar code between known v...

Full description

Saved in:

Bibliographic Details
Main Author:	Xu, Zhengzi
Other Authors:	Liu Yang
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Computer science and engineering::Software::Software engineering
Online Access:	https://hdl.handle.net/10356/146232
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-146232
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Software::Software engineering
spellingShingle	Engineering::Computer science and engineering::Software::Software engineering Xu, Zhengzi Semantic driven vulnerability detection and patch analysis
description	Software vulnerability has become a major threat to software security. Works have been proposed to search for vulnerabilities in both source code and binary programs. Code clone detection is one of the effective approaches to identify 1-day vulnerabilities, which detects similar code between known vulnerabilities and code in target programs. However, the current works have limitations such as lack of binary vulnerability data, inaccurate matching algorithms, noise-prone matching results, limited ability to detect new vulnerabilities, and lack of mitigation methods. To address them, we propose a framework, which provides a complete solution to collect known vulnerability information, to match for 1-day and recurring vulnerabilities with high accuracy across different compilation settings in binary and source code, and to provide automatically generated hot patches to fix the flaws. First, when using matching to search for vulnerabilities, researchers are required to have the signature or pattern for known vulnerabilities as input. However, at the binary level, there is often limited information on known vulnerabilities. To obtain the binary vulnerability data for matching, we propose SPAIN, a tool that can automatically analyze program diffs across versions to identify security-related patches. It distinguishes the vulnerability patches from other program changes by partial trace execution technique. The experimental results show that it can efficiently distinguish the security patches with 71% true positives and less than 22% false-positive rate. It can find the vulnerabilities that are secretly patched. Second, the same functions in binary form may be different due to different compilation settings. To detect vulnerabilities in binary precisely, the framework needs to be able to match them. Therefore, we propose two cross-compiler, cross-optimization level, and cross-architecture matching tools, Bingo, and Bingo-E. Bingo uses a selective inline technique to construct the full semantics of the function. It then divides the function into various-length traces and matches the traces to measure function similarity. Bingo-E extends Bingo and introduces partial trace execution to capture the semantic feature of the functions. The evaluation results show that Bingo can achieve 41.5% top 1 rank accuracy when a match for CoreUtils projects across a different compiler and optimization levels. Bingo-E further improves the results which range from 70.1% to 99.7% for the same settings as Bingo. Third, the existing works only focus on improving the accuracy of function matching without taking patch information into account. Therefore, patched functions are usually predicted as vulnerable, resulting in a high false-positive rate. To address the problem, we propose BinXray, a tool to detect and filter out patched functions from the vulnerable function candidates. It uses a novel block mapping algorithm to compute the differences between vulnerable and patched functions and builds patch signatures by leveraging these differences. It uses the generated patch signatures with the length sensitive similarity matching algorithm to match and identify patched functions. Experiments have shown that BinXray can effectively and efficiently identify patched functions in 12 projects with 93% accuracy at the speed of 296.17ms per function on average. Forth, to efficiently and effectively fix the vulnerability detected. We propose a patch generation algorithm which leverages weakest precondition reasoning to learn the official patches and convert them into binary hot patches. We develop Vulmet, the prototype to generate semantic preserving patches. The experimental results show that 55 real-world vulnerabilities from the Android kernels have been successfully converted into hot patches, which incur little performance overhead after being applied to the system. Last, at the source code level, traditional code clone based approaches can only find known vulnerabilities. They are also not robust when some changes are introduced into the target functions since they use syntax level information. To this end, we propose MVP, a source code vulnerability matching tool, which summarizes the semantics of the known vulnerabilities. Then it matches for new vulnerabilities, which share a similar logic. The experiment shows that MVP can detect 97 new vulnerabilities in 10 commonly used projects. It takes 17,272.82 milliseconds on average to extract the vulnerability signatures and less than 100 milliseconds to match in the target programs.
author2	Liu Yang
author_facet	Liu Yang Xu, Zhengzi
format	Thesis-Doctor of Philosophy
author	Xu, Zhengzi
author_sort	Xu, Zhengzi
title	Semantic driven vulnerability detection and patch analysis
title_short	Semantic driven vulnerability detection and patch analysis
title_full	Semantic driven vulnerability detection and patch analysis
title_fullStr	Semantic driven vulnerability detection and patch analysis
title_full_unstemmed	Semantic driven vulnerability detection and patch analysis
title_sort	semantic driven vulnerability detection and patch analysis
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/146232
_version_	1695706160928129024
spelling	sg-ntu-dr.10356-1462322021-03-09T15:50:05Z Semantic driven vulnerability detection and patch analysis Xu, Zhengzi Liu Yang School of Computer Science and Engineering yangliu@ntu.edu.sg Engineering::Computer science and engineering::Software::Software engineering Software vulnerability has become a major threat to software security. Works have been proposed to search for vulnerabilities in both source code and binary programs. Code clone detection is one of the effective approaches to identify 1-day vulnerabilities, which detects similar code between known vulnerabilities and code in target programs. However, the current works have limitations such as lack of binary vulnerability data, inaccurate matching algorithms, noise-prone matching results, limited ability to detect new vulnerabilities, and lack of mitigation methods. To address them, we propose a framework, which provides a complete solution to collect known vulnerability information, to match for 1-day and recurring vulnerabilities with high accuracy across different compilation settings in binary and source code, and to provide automatically generated hot patches to fix the flaws. First, when using matching to search for vulnerabilities, researchers are required to have the signature or pattern for known vulnerabilities as input. However, at the binary level, there is often limited information on known vulnerabilities. To obtain the binary vulnerability data for matching, we propose SPAIN, a tool that can automatically analyze program diffs across versions to identify security-related patches. It distinguishes the vulnerability patches from other program changes by partial trace execution technique. The experimental results show that it can efficiently distinguish the security patches with 71% true positives and less than 22% false-positive rate. It can find the vulnerabilities that are secretly patched. Second, the same functions in binary form may be different due to different compilation settings. To detect vulnerabilities in binary precisely, the framework needs to be able to match them. Therefore, we propose two cross-compiler, cross-optimization level, and cross-architecture matching tools, Bingo, and Bingo-E. Bingo uses a selective inline technique to construct the full semantics of the function. It then divides the function into various-length traces and matches the traces to measure function similarity. Bingo-E extends Bingo and introduces partial trace execution to capture the semantic feature of the functions. The evaluation results show that Bingo can achieve 41.5% top 1 rank accuracy when a match for CoreUtils projects across a different compiler and optimization levels. Bingo-E further improves the results which range from 70.1% to 99.7% for the same settings as Bingo. Third, the existing works only focus on improving the accuracy of function matching without taking patch information into account. Therefore, patched functions are usually predicted as vulnerable, resulting in a high false-positive rate. To address the problem, we propose BinXray, a tool to detect and filter out patched functions from the vulnerable function candidates. It uses a novel block mapping algorithm to compute the differences between vulnerable and patched functions and builds patch signatures by leveraging these differences. It uses the generated patch signatures with the length sensitive similarity matching algorithm to match and identify patched functions. Experiments have shown that BinXray can effectively and efficiently identify patched functions in 12 projects with 93% accuracy at the speed of 296.17ms per function on average. Forth, to efficiently and effectively fix the vulnerability detected. We propose a patch generation algorithm which leverages weakest precondition reasoning to learn the official patches and convert them into binary hot patches. We develop Vulmet, the prototype to generate semantic preserving patches. The experimental results show that 55 real-world vulnerabilities from the Android kernels have been successfully converted into hot patches, which incur little performance overhead after being applied to the system. Last, at the source code level, traditional code clone based approaches can only find known vulnerabilities. They are also not robust when some changes are introduced into the target functions since they use syntax level information. To this end, we propose MVP, a source code vulnerability matching tool, which summarizes the semantics of the known vulnerabilities. Then it matches for new vulnerabilities, which share a similar logic. The experiment shows that MVP can detect 97 new vulnerabilities in 10 commonly used projects. It takes 17,272.82 milliseconds on average to extract the vulnerability signatures and less than 100 milliseconds to match in the target programs. Doctor of Philosophy 2021-02-03T04:50:14Z 2021-02-03T04:50:14Z 2021 Thesis-Doctor of Philosophy Xu, Z. (2021). Semantic driven vulnerability detection and patch analysis. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/146232 10.32657/10356/146232 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Semantic driven vulnerability detection and patch analysis

Similar Items