Semantic-based vulnerability analysis of software
Fuzzing is a popular approach for software vulnerability detection among both academia and industry. Software toolkits which perform fuzzing automatically are called fuzzers. Based on the extent to which the fuzzer knows about the internals of the target program, fuzzers can be categorized as bla...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/142914 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Fuzzing is a popular approach for software vulnerability detection among both
academia and industry. Software toolkits which perform fuzzing automatically are
called fuzzers. Based on the extent to which the fuzzer knows about the internals of the
target program, fuzzers can be categorized as blackbox, greybox and whitebox. Blackbox
fuzzers are fast and scalable, but lack effectiveness. Whitebox fuzzers can capture
every execution state of the target program. They often leverage on symbolic execution
or taint analysis techniques to perform fuzzing, which can improve the effectiveness
but will also drastically limit the scalability. Having the best of both worlds, greybox
fuzzing strikes a balance between scalability and effectiveness. This is achieved by using
light-weight instrumentations to collect only the needed information for fuzzing.
In this thesis, we focus on discussing the techniques to improve greybox fuzzing if it
is used for good purpose and the methods to combat greybox fuzzing if it is used by
malicious parties.
First of all, we develop a general greybox fuzzing framework called Fuzzing Orchestration
Toolkit (FOT). The design of FOT is inspired by American Fuzzy Lop (AFL),
which is currently the most widely used greybox fuzzer. The purpose of developing
FOT is that we want a modular greybox fuzzer for easy extension and trying out new
ideas. Compared to AFL, FOT is extensible and new features can be introduced with
significantly overhead on fuzzing efficiency Besides, with improved flexibility, the parallel
fuzzing mechanism of FOT is also different from AFL. In FOT, we divide the
greybox fuzzer into several different but connected components. These components are
interconnected while each of them is a stand alone module. This enables us to modify
FOT for experimental features with ease. With FOT and its extensions, we have found
dozens of previously unknown bugs in several widely used open source projects. By
discussing the design of FOT, we can establish a better understanding about greybox
fuzzing as a basis for comprehending the next works in this thesis.Second, we study an approach to improve the effectiveness of greybox fuzzing by
allowing it to penetrate through magic word comparisons in the target program. One
of the key bottlenecks in greybox fuzzing is that the fuzzers can hardly get to the true
branch of an if condition with magic word comparisons. This is because the chance
of generating a magic word via randomly mutating the input file is very low and often,
negligible. To address this challenge, we propose a divide-and-conquer approach of
splitting the magic word comparison into several small steps to guide the greybox fuzzer
through. The proposed approach is called Steelix. We conducted thorough experiments
on three different sets of programs/benchmarks and verfied the effectivness of Steelix.
Furthermore, with Steelix, we found one CVE and nine previously unknown bugs.
Third, we focus on an approach of adopting multi-objective optimization (MOO)
for seed selection of greybox fuzzing. This approach is called Cerebro. Seed selection
is an essential problem in greybox fuzzing. It directly affects how fast the entire fuzzing
process can converge and it is especially important if the target program is large. This is
because greybox fuzzers often need to maintain a large set of seed inputs when fuzzing
big programs, raising the importance of seed quality evaluation. In Cerebro, we model
the evaluation of seed quality as an MOO problem and propose a novel online algorithm
to solve the problem. Besides, we also propose a new property, called input potential
for seed quality evaluation in Cerebro. Our experiments showed that Cerebro can detect
around 50% more crashes than other state-of-the-art greybox fuzzers on average in 24
hours. With Cerebro, we found one CVE and fourteen previously unknown bugs.
Lastly, we concentrate on methods for anti-greybox fuzzing. In recent years, many
techniques have been proposed to improve the effectiveness and efficiency of greybox
fuzzing. On one hand, these advances help the security experts to detect vulnerabilities
earlier for patching. On the other hand, these techniques may be used by malicious
users to hunt vulnerabilities and build exploits for commercial-off-the-shelf-software
(COTS). To limit the usage of greybox fuzzing by malicious users, we propose a technique
called Vall-nut. In Vall-nut, instead of building strategies for different implementations
of greybox fuzzers, we generalize a model of greybox fuzzing and propose
counteractive strategies according to the key components of the model. The proposed strategies can hinder the greybox fuzzers by limiting their code coverage and bug detection
capability. To evaluate the performance of Vall-nut, we use two greybox fuzzers
and six real-world programs. With experiments, we showed that greybox fuzzers can
perform even worse than blackbox fuzzers on the target program protected by Vall-nut.
The experiments also showed that Vall-nut can reduce the number of crashes detected
by greybox fuzzers by 76% and reduce the code coverage by 34% on average. |
---|