How effective are they? Exploring large language model based fuzz driver generation

Fuzz drivers are essential for library API fuzzing. However, automatically generating fuzz drivers is a complex task, as it demands the creation of high-quality, correct, and robust API usage code. An LLM-based (Large Language Model) approach for generating fuzz drivers is a promising area of resear...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHANG, Cen, ZHENG, Yaowen, BAI, Mingqiang, LI, Yeting, MA, Wei, XIE, Xiaofei
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Fuzz driver generation Fuzz testing Large language model Artificial Intelligence and Robotics Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9508 https://ink.library.smu.edu.sg/context/sis_research/article/10508/viewcontent/How_Effective_Are_They__Exploring_Large_Language_Model_Based_Fuzz_Driver_Generation.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10508
record_format	dspace
spelling	sg-smu-ink.sis_research-105082024-11-15T07:44:27Z How effective are they? Exploring large language model based fuzz driver generation ZHANG, Cen ZHENG, Yaowen BAI, Mingqiang LI, Yeting MA, Wei XIE, Xiaofei Fuzz drivers are essential for library API fuzzing. However, automatically generating fuzz drivers is a complex task, as it demands the creation of high-quality, correct, and robust API usage code. An LLM-based (Large Language Model) approach for generating fuzz drivers is a promising area of research. Unlike traditional program analysis-based generators, this text-based approach is more generalized and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its e ectiveness and potential challenges. To bridge this gap, we conducted the rst in-depth study targeting the important issues of using LLMs to generate e ective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30widely-usedCprojects. Six prompting strategies are designed and tested across ve state-of-the-art LLMs with vedi erenttemperaturesettings.Intotal, ourstudyevaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: 1) While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; 2) LLMs face di culties in generating e ective fuzz drivers for APIs with intricate speci cs. Three featured design choices of prompt strategies can be bene cial: issuing repeat queries, querying with examples, and employing an iterative querying process; 3) While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. 2024-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9508 info:doi/10.1145/3650212.3680355 https://ink.library.smu.edu.sg/context/sis_research/article/10508/viewcontent/How_Effective_Are_They__Exploring_Large_Language_Model_Based_Fuzz_Driver_Generation.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Fuzz driver generation Fuzz testing Large language model Artificial Intelligence and Robotics Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Fuzz driver generation Fuzz testing Large language model Artificial Intelligence and Robotics Software Engineering
spellingShingle	Fuzz driver generation Fuzz testing Large language model Artificial Intelligence and Robotics Software Engineering ZHANG, Cen ZHENG, Yaowen BAI, Mingqiang LI, Yeting MA, Wei XIE, Xiaofei How effective are they? Exploring large language model based fuzz driver generation
description	Fuzz drivers are essential for library API fuzzing. However, automatically generating fuzz drivers is a complex task, as it demands the creation of high-quality, correct, and robust API usage code. An LLM-based (Large Language Model) approach for generating fuzz drivers is a promising area of research. Unlike traditional program analysis-based generators, this text-based approach is more generalized and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its e ectiveness and potential challenges. To bridge this gap, we conducted the rst in-depth study targeting the important issues of using LLMs to generate e ective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30widely-usedCprojects. Six prompting strategies are designed and tested across ve state-of-the-art LLMs with vedi erenttemperaturesettings.Intotal, ourstudyevaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: 1) While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; 2) LLMs face di culties in generating e ective fuzz drivers for APIs with intricate speci cs. Three featured design choices of prompt strategies can be bene cial: issuing repeat queries, querying with examples, and employing an iterative querying process; 3) While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection.
format	text
author	ZHANG, Cen ZHENG, Yaowen BAI, Mingqiang LI, Yeting MA, Wei XIE, Xiaofei
author_facet	ZHANG, Cen ZHENG, Yaowen BAI, Mingqiang LI, Yeting MA, Wei XIE, Xiaofei
author_sort	ZHANG, Cen
title	How effective are they? Exploring large language model based fuzz driver generation
title_short	How effective are they? Exploring large language model based fuzz driver generation
title_full	How effective are they? Exploring large language model based fuzz driver generation
title_fullStr	How effective are they? Exploring large language model based fuzz driver generation
title_full_unstemmed	How effective are they? Exploring large language model based fuzz driver generation
title_sort	how effective are they? exploring large language model based fuzz driver generation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9508 https://ink.library.smu.edu.sg/context/sis_research/article/10508/viewcontent/How_Effective_Are_They__Exploring_Large_Language_Model_Based_Fuzz_Driver_Generation.pdf
_version_	1816859116387172352

How effective are they? Exploring large language model based fuzz driver generation

Similar Items