Effective fuzz testing via model-guided program synthesis

Fuzzing has been widely recognized as the most effective method for finding vulnerabilities. Many security researchers have proposed methods for improving fuzzers in order to detect more types of vulnerabilities, achieve higher coverage, or find more bugs in a given amount of time. However, there ha...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Cen
Other Authors: Liu Yang
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/169112
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-169112
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Software::Software engineering
spellingShingle Engineering::Computer science and engineering::Software::Software engineering
Zhang, Cen
Effective fuzz testing via model-guided program synthesis
description Fuzzing has been widely recognized as the most effective method for finding vulnerabilities. Many security researchers have proposed methods for improving fuzzers in order to detect more types of vulnerabilities, achieve higher coverage, or find more bugs in a given amount of time. However, there has been relatively little research on the gap between the diverse requirements of testing different types of targets and the general assumptions made by existing fuzzers. This gap is caused by mismatches between the interfaces provided by the targets and the interfaces assumed by the fuzzers. For example, it is difficult to test firmware targets because of the lack of execution environments, and fuzzing libraries requires the use of fuzz drivers, which are pieces of code that properly use library APIs. Effectively fuzzing these types of targets requires both engineering efforts and new methods. In this thesis, we present four works that address this gap in the context of three types of targets: binaries, kernel modules inside firmware, and libraries. First, we developed a general fuzzing framework called Binary Fuzzing Framework (BiFF). The goal of BiFF was to provide a more effective fuzzer in binary-only fuzzing scenario, providing us an enhanced tool set for the following research. It is designed to satisfy the following requirements: 1) support binary-only fuzzing, 2) support all mainstream CPU architectures, 3) have high performance, and 4) be flexible enough for fuzzing server-like targets such as Android APKs or Linux daemons. To accomplish that, we developed our own dynamic binary instrumentation engine, which has instrumentation mechanism optimized for fuzzing tasks and supported six CPU architectures (Intel X86-32, Intel X86-64, ARM-32, ARM-64, MIPS-64, and PowerPC-64). In addition to the engine, BiFF modularized the fuzzing workflow and introduced a new feature called attach mode fuzzing, which allows for customized fuzzing start and end points. This feature enables the fuzzing of server-like targets with appropriate configuration. Finally, BiFF also incorporated several advanced fuzzing algorithms. The experiments showed that BiFF has superior performance for fuzzing traditional targets and can support the fuzzing of server-like targets. BiFF has been used by the testing team at Huawei, where it has discovered over twenty unique vulnerabilities in their products within two months. As mentioned above, BiFF served as a basic tool for the other works discussed in this thesis. Second, we discussed an emulation framework FirmGuide aims to address the problem of manually creating execution environments for fuzzing kernel modules in Linux-based firmware using QEMU, which has limited support for certain types of devices. To improve QEMU's capabilities on Linux-based firmware, FirmGuide uses a semi-automated emulation approach that first studies the code of the Linux kernel to identify necessary peripherals that determine the outcome of the boot process. If the kernel can be successfully booted, it indicates a successful emulation, enabling the application of basic dynamic analysis on kernel modules. FirmGuide then divides the emulation of a particular type of peripheral into two steps: the manual induction of a model template and the automatic extraction of model parameters. The model template represents patterns of kernel-hardware interactions, while the model parameters represent the actual content of those interactions, which can be extracted from the kernel driver's code through symbolic execution. To date, FirmGuide has created dynamic analysis environments for 5,947 firmware with a success rate of 95.8%. These firmware cover more than 26 System-on-Chips, 2 CPU architectures, and 22 versions of the Linux kernel, and have been used in applications such as vulnerability analysis and fuzzing to demonstrate the potential usage scenarios. Then, an automated tool called APICraft is introduced, which is capable of synthesizing fuzz drivers for closed-source libraries. These generated fuzz drivers accurately utilize the library APIs, allowing existing fuzzers to test the libraries by fuzzing the drivers. There are two main challenges in this process: 1) there is limited information available for synthesizing fuzz drivers for closed-source libraries; and 2) the complex semantic relationships among the API functions need to be maintained while ensuring their correctness. To address these challenges, APICraft employs a collect -- combine approach that starts by learning API data and control dependencies from traces of programs that have used the target library. Next, it uses a bottom-up approach and a multi-objective genetic algorithm to synthesize the fuzz drivers by randomly linking the learned dependencies and converting them into drivers. The linked dependencies are then improved based on scores from various independent metrics. Our research found that the automated fuzz tests generated by this approach were more effective than manual ones, discovering 64% more basic blocks and 12 additional unique crashes within 24 hours on average. In a long-term fuzzing campaign using the generated drivers, we identified 142 vulnerabilities (54 CVEs) in the macOS SDK, affecting commercial products such as Safari and Preview. Last, we presented Rubick as a solution to the challenges faced by current fuzz driver generators. Rubick aims to automatically generate control-flow-sensitive fuzz drivers (A fuzz driver for library APIs is control-flow-sensitive if its API usage involves control flow conditions such as branches and loops. For example, a typical case is to check the value of a pointer returned by an API and ends the execution if it is a null pointer.) that address the following issues: 1) incorporating and utilizing control dependencies in API usage; 2) handling noise in learned API usage, especially for complex real-world consumer programs; and 3) coordinating independent sets of API usage within the fuzz driver to work effectively with fuzzers. To address these challenges, Rubick uses an automata-guided approach that has three main features: 1) representing API usage (including data and control dependencies) as a deterministic finite automaton; 2) using an active automata learning algorithm to refine the learned API usage; and 3) synthesizing a single automata-guided fuzz driver that provides a scheduling interface for the fuzzer to test independent sets of API usage during fuzzing. The results of our experiments showed that the fuzz drivers generated by Rubick significantly outperformed the baselines, covering an average of 50.42% more edges than those generated by FuzzGen and 44.58% more edges than manually written fuzz drivers from OSS-Fuzz or human experts. By learning from large-scale open source projects, Rubick generated fuzz drivers for 11 popular Java projects, two of which have been merged into OSS-Fuzz. So far, these fuzz drivers have identified 199 bugs, including four CVEs, that could potentially affect popular PC and Android software with more than dozens of millions of downloads.
author2 Liu Yang
author_facet Liu Yang
Zhang, Cen
format Thesis-Doctor of Philosophy
author Zhang, Cen
author_sort Zhang, Cen
title Effective fuzz testing via model-guided program synthesis
title_short Effective fuzz testing via model-guided program synthesis
title_full Effective fuzz testing via model-guided program synthesis
title_fullStr Effective fuzz testing via model-guided program synthesis
title_full_unstemmed Effective fuzz testing via model-guided program synthesis
title_sort effective fuzz testing via model-guided program synthesis
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/169112
_version_ 1772825789638639616
spelling sg-ntu-dr.10356-1691122023-07-04T01:52:13Z Effective fuzz testing via model-guided program synthesis Zhang, Cen Liu Yang School of Computer Science and Engineering Cyber Security Lab yangliu@ntu.edu.sg Engineering::Computer science and engineering::Software::Software engineering Fuzzing has been widely recognized as the most effective method for finding vulnerabilities. Many security researchers have proposed methods for improving fuzzers in order to detect more types of vulnerabilities, achieve higher coverage, or find more bugs in a given amount of time. However, there has been relatively little research on the gap between the diverse requirements of testing different types of targets and the general assumptions made by existing fuzzers. This gap is caused by mismatches between the interfaces provided by the targets and the interfaces assumed by the fuzzers. For example, it is difficult to test firmware targets because of the lack of execution environments, and fuzzing libraries requires the use of fuzz drivers, which are pieces of code that properly use library APIs. Effectively fuzzing these types of targets requires both engineering efforts and new methods. In this thesis, we present four works that address this gap in the context of three types of targets: binaries, kernel modules inside firmware, and libraries. First, we developed a general fuzzing framework called Binary Fuzzing Framework (BiFF). The goal of BiFF was to provide a more effective fuzzer in binary-only fuzzing scenario, providing us an enhanced tool set for the following research. It is designed to satisfy the following requirements: 1) support binary-only fuzzing, 2) support all mainstream CPU architectures, 3) have high performance, and 4) be flexible enough for fuzzing server-like targets such as Android APKs or Linux daemons. To accomplish that, we developed our own dynamic binary instrumentation engine, which has instrumentation mechanism optimized for fuzzing tasks and supported six CPU architectures (Intel X86-32, Intel X86-64, ARM-32, ARM-64, MIPS-64, and PowerPC-64). In addition to the engine, BiFF modularized the fuzzing workflow and introduced a new feature called attach mode fuzzing, which allows for customized fuzzing start and end points. This feature enables the fuzzing of server-like targets with appropriate configuration. Finally, BiFF also incorporated several advanced fuzzing algorithms. The experiments showed that BiFF has superior performance for fuzzing traditional targets and can support the fuzzing of server-like targets. BiFF has been used by the testing team at Huawei, where it has discovered over twenty unique vulnerabilities in their products within two months. As mentioned above, BiFF served as a basic tool for the other works discussed in this thesis. Second, we discussed an emulation framework FirmGuide aims to address the problem of manually creating execution environments for fuzzing kernel modules in Linux-based firmware using QEMU, which has limited support for certain types of devices. To improve QEMU's capabilities on Linux-based firmware, FirmGuide uses a semi-automated emulation approach that first studies the code of the Linux kernel to identify necessary peripherals that determine the outcome of the boot process. If the kernel can be successfully booted, it indicates a successful emulation, enabling the application of basic dynamic analysis on kernel modules. FirmGuide then divides the emulation of a particular type of peripheral into two steps: the manual induction of a model template and the automatic extraction of model parameters. The model template represents patterns of kernel-hardware interactions, while the model parameters represent the actual content of those interactions, which can be extracted from the kernel driver's code through symbolic execution. To date, FirmGuide has created dynamic analysis environments for 5,947 firmware with a success rate of 95.8%. These firmware cover more than 26 System-on-Chips, 2 CPU architectures, and 22 versions of the Linux kernel, and have been used in applications such as vulnerability analysis and fuzzing to demonstrate the potential usage scenarios. Then, an automated tool called APICraft is introduced, which is capable of synthesizing fuzz drivers for closed-source libraries. These generated fuzz drivers accurately utilize the library APIs, allowing existing fuzzers to test the libraries by fuzzing the drivers. There are two main challenges in this process: 1) there is limited information available for synthesizing fuzz drivers for closed-source libraries; and 2) the complex semantic relationships among the API functions need to be maintained while ensuring their correctness. To address these challenges, APICraft employs a collect -- combine approach that starts by learning API data and control dependencies from traces of programs that have used the target library. Next, it uses a bottom-up approach and a multi-objective genetic algorithm to synthesize the fuzz drivers by randomly linking the learned dependencies and converting them into drivers. The linked dependencies are then improved based on scores from various independent metrics. Our research found that the automated fuzz tests generated by this approach were more effective than manual ones, discovering 64% more basic blocks and 12 additional unique crashes within 24 hours on average. In a long-term fuzzing campaign using the generated drivers, we identified 142 vulnerabilities (54 CVEs) in the macOS SDK, affecting commercial products such as Safari and Preview. Last, we presented Rubick as a solution to the challenges faced by current fuzz driver generators. Rubick aims to automatically generate control-flow-sensitive fuzz drivers (A fuzz driver for library APIs is control-flow-sensitive if its API usage involves control flow conditions such as branches and loops. For example, a typical case is to check the value of a pointer returned by an API and ends the execution if it is a null pointer.) that address the following issues: 1) incorporating and utilizing control dependencies in API usage; 2) handling noise in learned API usage, especially for complex real-world consumer programs; and 3) coordinating independent sets of API usage within the fuzz driver to work effectively with fuzzers. To address these challenges, Rubick uses an automata-guided approach that has three main features: 1) representing API usage (including data and control dependencies) as a deterministic finite automaton; 2) using an active automata learning algorithm to refine the learned API usage; and 3) synthesizing a single automata-guided fuzz driver that provides a scheduling interface for the fuzzer to test independent sets of API usage during fuzzing. The results of our experiments showed that the fuzz drivers generated by Rubick significantly outperformed the baselines, covering an average of 50.42% more edges than those generated by FuzzGen and 44.58% more edges than manually written fuzz drivers from OSS-Fuzz or human experts. By learning from large-scale open source projects, Rubick generated fuzz drivers for 11 popular Java projects, two of which have been merged into OSS-Fuzz. So far, these fuzz drivers have identified 199 bugs, including four CVEs, that could potentially affect popular PC and Android software with more than dozens of millions of downloads. Doctor of Philosophy 2023-06-30T08:27:25Z 2023-06-30T08:27:25Z 2023 Thesis-Doctor of Philosophy Zhang, C. (2023). Effective fuzz testing via model-guided program synthesis. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/169112 https://hdl.handle.net/10356/169112 10.32657/10356/169112 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University