Semantic-based vulnerability analysis of software

Fuzzing is a popular approach for software vulnerability detection among both academia and industry. Software toolkits which perform fuzzing automatically are called fuzzers. Based on the extent to which the fuzzer knows about the internals of the target program, fuzzers can be categorized as bla...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Yuekang
Other Authors: Liu Yang
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/142914
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-142914
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Li, Yuekang
Semantic-based vulnerability analysis of software
description Fuzzing is a popular approach for software vulnerability detection among both academia and industry. Software toolkits which perform fuzzing automatically are called fuzzers. Based on the extent to which the fuzzer knows about the internals of the target program, fuzzers can be categorized as blackbox, greybox and whitebox. Blackbox fuzzers are fast and scalable, but lack effectiveness. Whitebox fuzzers can capture every execution state of the target program. They often leverage on symbolic execution or taint analysis techniques to perform fuzzing, which can improve the effectiveness but will also drastically limit the scalability. Having the best of both worlds, greybox fuzzing strikes a balance between scalability and effectiveness. This is achieved by using light-weight instrumentations to collect only the needed information for fuzzing. In this thesis, we focus on discussing the techniques to improve greybox fuzzing if it is used for good purpose and the methods to combat greybox fuzzing if it is used by malicious parties. First of all, we develop a general greybox fuzzing framework called Fuzzing Orchestration Toolkit (FOT). The design of FOT is inspired by American Fuzzy Lop (AFL), which is currently the most widely used greybox fuzzer. The purpose of developing FOT is that we want a modular greybox fuzzer for easy extension and trying out new ideas. Compared to AFL, FOT is extensible and new features can be introduced with significantly overhead on fuzzing efficiency Besides, with improved flexibility, the parallel fuzzing mechanism of FOT is also different from AFL. In FOT, we divide the greybox fuzzer into several different but connected components. These components are interconnected while each of them is a stand alone module. This enables us to modify FOT for experimental features with ease. With FOT and its extensions, we have found dozens of previously unknown bugs in several widely used open source projects. By discussing the design of FOT, we can establish a better understanding about greybox fuzzing as a basis for comprehending the next works in this thesis.Second, we study an approach to improve the effectiveness of greybox fuzzing by allowing it to penetrate through magic word comparisons in the target program. One of the key bottlenecks in greybox fuzzing is that the fuzzers can hardly get to the true branch of an if condition with magic word comparisons. This is because the chance of generating a magic word via randomly mutating the input file is very low and often, negligible. To address this challenge, we propose a divide-and-conquer approach of splitting the magic word comparison into several small steps to guide the greybox fuzzer through. The proposed approach is called Steelix. We conducted thorough experiments on three different sets of programs/benchmarks and verfied the effectivness of Steelix. Furthermore, with Steelix, we found one CVE and nine previously unknown bugs. Third, we focus on an approach of adopting multi-objective optimization (MOO) for seed selection of greybox fuzzing. This approach is called Cerebro. Seed selection is an essential problem in greybox fuzzing. It directly affects how fast the entire fuzzing process can converge and it is especially important if the target program is large. This is because greybox fuzzers often need to maintain a large set of seed inputs when fuzzing big programs, raising the importance of seed quality evaluation. In Cerebro, we model the evaluation of seed quality as an MOO problem and propose a novel online algorithm to solve the problem. Besides, we also propose a new property, called input potential for seed quality evaluation in Cerebro. Our experiments showed that Cerebro can detect around 50% more crashes than other state-of-the-art greybox fuzzers on average in 24 hours. With Cerebro, we found one CVE and fourteen previously unknown bugs. Lastly, we concentrate on methods for anti-greybox fuzzing. In recent years, many techniques have been proposed to improve the effectiveness and efficiency of greybox fuzzing. On one hand, these advances help the security experts to detect vulnerabilities earlier for patching. On the other hand, these techniques may be used by malicious users to hunt vulnerabilities and build exploits for commercial-off-the-shelf-software (COTS). To limit the usage of greybox fuzzing by malicious users, we propose a technique called Vall-nut. In Vall-nut, instead of building strategies for different implementations of greybox fuzzers, we generalize a model of greybox fuzzing and propose counteractive strategies according to the key components of the model. The proposed strategies can hinder the greybox fuzzers by limiting their code coverage and bug detection capability. To evaluate the performance of Vall-nut, we use two greybox fuzzers and six real-world programs. With experiments, we showed that greybox fuzzers can perform even worse than blackbox fuzzers on the target program protected by Vall-nut. The experiments also showed that Vall-nut can reduce the number of crashes detected by greybox fuzzers by 76% and reduce the code coverage by 34% on average.
author2 Liu Yang
author_facet Liu Yang
Li, Yuekang
format Thesis-Doctor of Philosophy
author Li, Yuekang
author_sort Li, Yuekang
title Semantic-based vulnerability analysis of software
title_short Semantic-based vulnerability analysis of software
title_full Semantic-based vulnerability analysis of software
title_fullStr Semantic-based vulnerability analysis of software
title_full_unstemmed Semantic-based vulnerability analysis of software
title_sort semantic-based vulnerability analysis of software
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/142914
_version_ 1683494473180381184
spelling sg-ntu-dr.10356-1429142020-10-28T08:40:28Z Semantic-based vulnerability analysis of software Li, Yuekang Liu Yang School of Computer Science and Engineering yangliu@ntu.edu.sg Engineering::Computer science and engineering Fuzzing is a popular approach for software vulnerability detection among both academia and industry. Software toolkits which perform fuzzing automatically are called fuzzers. Based on the extent to which the fuzzer knows about the internals of the target program, fuzzers can be categorized as blackbox, greybox and whitebox. Blackbox fuzzers are fast and scalable, but lack effectiveness. Whitebox fuzzers can capture every execution state of the target program. They often leverage on symbolic execution or taint analysis techniques to perform fuzzing, which can improve the effectiveness but will also drastically limit the scalability. Having the best of both worlds, greybox fuzzing strikes a balance between scalability and effectiveness. This is achieved by using light-weight instrumentations to collect only the needed information for fuzzing. In this thesis, we focus on discussing the techniques to improve greybox fuzzing if it is used for good purpose and the methods to combat greybox fuzzing if it is used by malicious parties. First of all, we develop a general greybox fuzzing framework called Fuzzing Orchestration Toolkit (FOT). The design of FOT is inspired by American Fuzzy Lop (AFL), which is currently the most widely used greybox fuzzer. The purpose of developing FOT is that we want a modular greybox fuzzer for easy extension and trying out new ideas. Compared to AFL, FOT is extensible and new features can be introduced with significantly overhead on fuzzing efficiency Besides, with improved flexibility, the parallel fuzzing mechanism of FOT is also different from AFL. In FOT, we divide the greybox fuzzer into several different but connected components. These components are interconnected while each of them is a stand alone module. This enables us to modify FOT for experimental features with ease. With FOT and its extensions, we have found dozens of previously unknown bugs in several widely used open source projects. By discussing the design of FOT, we can establish a better understanding about greybox fuzzing as a basis for comprehending the next works in this thesis.Second, we study an approach to improve the effectiveness of greybox fuzzing by allowing it to penetrate through magic word comparisons in the target program. One of the key bottlenecks in greybox fuzzing is that the fuzzers can hardly get to the true branch of an if condition with magic word comparisons. This is because the chance of generating a magic word via randomly mutating the input file is very low and often, negligible. To address this challenge, we propose a divide-and-conquer approach of splitting the magic word comparison into several small steps to guide the greybox fuzzer through. The proposed approach is called Steelix. We conducted thorough experiments on three different sets of programs/benchmarks and verfied the effectivness of Steelix. Furthermore, with Steelix, we found one CVE and nine previously unknown bugs. Third, we focus on an approach of adopting multi-objective optimization (MOO) for seed selection of greybox fuzzing. This approach is called Cerebro. Seed selection is an essential problem in greybox fuzzing. It directly affects how fast the entire fuzzing process can converge and it is especially important if the target program is large. This is because greybox fuzzers often need to maintain a large set of seed inputs when fuzzing big programs, raising the importance of seed quality evaluation. In Cerebro, we model the evaluation of seed quality as an MOO problem and propose a novel online algorithm to solve the problem. Besides, we also propose a new property, called input potential for seed quality evaluation in Cerebro. Our experiments showed that Cerebro can detect around 50% more crashes than other state-of-the-art greybox fuzzers on average in 24 hours. With Cerebro, we found one CVE and fourteen previously unknown bugs. Lastly, we concentrate on methods for anti-greybox fuzzing. In recent years, many techniques have been proposed to improve the effectiveness and efficiency of greybox fuzzing. On one hand, these advances help the security experts to detect vulnerabilities earlier for patching. On the other hand, these techniques may be used by malicious users to hunt vulnerabilities and build exploits for commercial-off-the-shelf-software (COTS). To limit the usage of greybox fuzzing by malicious users, we propose a technique called Vall-nut. In Vall-nut, instead of building strategies for different implementations of greybox fuzzers, we generalize a model of greybox fuzzing and propose counteractive strategies according to the key components of the model. The proposed strategies can hinder the greybox fuzzers by limiting their code coverage and bug detection capability. To evaluate the performance of Vall-nut, we use two greybox fuzzers and six real-world programs. With experiments, we showed that greybox fuzzers can perform even worse than blackbox fuzzers on the target program protected by Vall-nut. The experiments also showed that Vall-nut can reduce the number of crashes detected by greybox fuzzers by 76% and reduce the code coverage by 34% on average. Doctor of Philosophy 2020-07-09T01:41:41Z 2020-07-09T01:41:41Z 2019 Thesis-Doctor of Philosophy Li, Y. (2019). Semantic-based vulnerability analysis of software. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/142914 10.32657/10356/142914 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University