Neural code generation for robust automatic program repair
Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. Consequently, it has gained increasing attention as an essential technique in software development to boost developers’ productivity. Conventional search-based techniques...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/173910 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-173910 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science |
spellingShingle |
Computer and Information Science Wang, Weishi Neural code generation for robust automatic program repair |
description |
Automatic program repair (APR) is crucial to reduce manual debugging efforts for
developers and improve software reliability. Consequently, it has gained increasing
attention as an essential technique in software development to boost developers’
productivity. Conventional search-based techniques typically rely on heuristic rules
or a redundancy assumption to mine fix patterns, which continuously generate
code patches until the resultant program meets the pre-defined test specifications.
However, these approaches often yield a substantial amount of low-quality patch
candidates, leading to an ineffective APR system. To address these limitations,
inspired by the surge of deep learning (DL) based approaches for natural language
processing (NLP), we focus on robust APR methods in real-world scenarios via
neural code generation. In this thesis, we leverage recent advances in deep learning
and deploy novel transformer-based frameworks to automate the program repair
process in a data-driven manner. This thesis presents numerous contributions
toward the development of robust automatic program repair.
First, we propose a novel code-aware, encoder-decoder-based pre-trained programming language model CodeT5 to support both code understanding and generation
tasks. To be more specific, we employ a unified encoder-decoder transformer architecture T5 and incorporate code-specific knowledge for better code representation
and understanding. Furthermore, we propose a novel identifier-aware pre-training
task that enables the model to distinguish which code tokens are identifiers and to
recover them when they are masked. Besides, we propose to exploit the user-written
code comments with a bimodal dual-generation task for better natural language
(NL)-programming language (PL) alignment. Comprehensive experiments show
that CodeT5 significantly outperforms prior methods on understanding tasks such
as code defect detection and clone detection, and generation tasks across various
directions including PL-NL, NL-PL, and PL-PL. Further analysis reveals that our
model can better capture semantic information from code.
Second, we investigate the effectiveness of leveraging bug-fix patterns for automatic
program repair. We propose a novel Retrieval-Augmented Patch Generation
framework (RAP-Gen) by explicitly leveraging relevant fix patterns retrieved from
a codebase of previous bug-fix pairs. Specifically, we build a hybrid patch retriever
to account for both lexical and semantic matching based on the raw source code in a
language-agnostic manner, which does not rely on any code-specific features. In addition, we adapt our code-aware language model CodeT5 as the foundation model
to facilitate both patch retrieval and generation tasks in a unified manner. Notably,
RAP-Gen is a generic APR framework that can flexibly integrate different patch
retrievers and generators to repair various types of bugs. We thoroughly evaluate
RAP-Gen on three benchmarks in two programming languages, including the TFix
benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java,
where the bug localization information may or may not be provided. Experimental results show that RAP-Gen significantly outperforms previous state-of-the-art
(SoTA) approaches on all benchmarks, e.g., boosting the accuracy of T5-large on
TFix from 49.70% to 54.15% (repairing 478 more bugs) and repairing 15 more bugs
on 818 Defects4J bugs. Further analysis reveals that our patch retriever can search
for relevant fix patterns to guide the APR systems.
Third, we focus on a novel task of low-resource APR. Recent advances in deep
learning (DL) based models have demonstrated promising results by learning from
large-scale bug-fix examples in a data-driven manner. However, in practical scenarios, software bugs have an imbalanced distribution, and the fixing knowledge
learned by APR models often only captures the patterns of frequent error types,
making it inapplicable to handle the rare error types. To address this limitation, we propose Meta-APR, a new meta-learning framework integrated with code
pretrained language models to generate fixes for low-resource bugs with limited
training samples. Extensive experimental results on three benchmarks in various
programming languages verify the superiority of our method over existing DL-based
APR approaches.
Last but not least, we explore xCodeEval, the largest executable multilingual
multitask benchmark to date consisting of 25 M document-level coding examples
from about 7.5K unique problems covering up to 17 programming languages with
execution-level parallelism. We propose a novel APR task to synthesize a fix for a
detected program bug. Specifically, given a bug-specific defect, the objective of this
task is to generate a correct fix that passes all the unit tests. Detailed experiments
demonstrate that our proposed APR task offers a fresh perspective for examining
and analyzing large language model (LLM)-based APR, facilitating comprehensive
and to some extent highly interpretable investigations of their repair performance.
This thesis strives for a robust neural code generation across multiple languages
and tasks, facilitating real-world APR tasks to alleviate manual debugging efforts
for everyone regardless of their coding background. |
author2 |
Joty Shafiq Rayhan |
author_facet |
Joty Shafiq Rayhan Wang, Weishi |
format |
Thesis-Doctor of Philosophy |
author |
Wang, Weishi |
author_sort |
Wang, Weishi |
title |
Neural code generation for robust automatic program repair |
title_short |
Neural code generation for robust automatic program repair |
title_full |
Neural code generation for robust automatic program repair |
title_fullStr |
Neural code generation for robust automatic program repair |
title_full_unstemmed |
Neural code generation for robust automatic program repair |
title_sort |
neural code generation for robust automatic program repair |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/173910 |
_version_ |
1800916256977584128 |
spelling |
sg-ntu-dr.10356-1739102024-04-09T03:58:57Z Neural code generation for robust automatic program repair Wang, Weishi Joty Shafiq Rayhan Luu Anh Tuan School of Computer Science and Engineering Salesforce Research Aisa Steven Hoi anhtuan.luu@ntu.edu.sg, srjoty@ntu.edu.sg Computer and Information Science Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. Consequently, it has gained increasing attention as an essential technique in software development to boost developers’ productivity. Conventional search-based techniques typically rely on heuristic rules or a redundancy assumption to mine fix patterns, which continuously generate code patches until the resultant program meets the pre-defined test specifications. However, these approaches often yield a substantial amount of low-quality patch candidates, leading to an ineffective APR system. To address these limitations, inspired by the surge of deep learning (DL) based approaches for natural language processing (NLP), we focus on robust APR methods in real-world scenarios via neural code generation. In this thesis, we leverage recent advances in deep learning and deploy novel transformer-based frameworks to automate the program repair process in a data-driven manner. This thesis presents numerous contributions toward the development of robust automatic program repair. First, we propose a novel code-aware, encoder-decoder-based pre-trained programming language model CodeT5 to support both code understanding and generation tasks. To be more specific, we employ a unified encoder-decoder transformer architecture T5 and incorporate code-specific knowledge for better code representation and understanding. Furthermore, we propose a novel identifier-aware pre-training task that enables the model to distinguish which code tokens are identifiers and to recover them when they are masked. Besides, we propose to exploit the user-written code comments with a bimodal dual-generation task for better natural language (NL)-programming language (PL) alignment. Comprehensive experiments show that CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL. Further analysis reveals that our model can better capture semantic information from code. Second, we investigate the effectiveness of leveraging bug-fix patterns for automatic program repair. We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) by explicitly leveraging relevant fix patterns retrieved from a codebase of previous bug-fix pairs. Specifically, we build a hybrid patch retriever to account for both lexical and semantic matching based on the raw source code in a language-agnostic manner, which does not rely on any code-specific features. In addition, we adapt our code-aware language model CodeT5 as the foundation model to facilitate both patch retrieval and generation tasks in a unified manner. Notably, RAP-Gen is a generic APR framework that can flexibly integrate different patch retrievers and generators to repair various types of bugs. We thoroughly evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java, where the bug localization information may or may not be provided. Experimental results show that RAP-Gen significantly outperforms previous state-of-the-art (SoTA) approaches on all benchmarks, e.g., boosting the accuracy of T5-large on TFix from 49.70% to 54.15% (repairing 478 more bugs) and repairing 15 more bugs on 818 Defects4J bugs. Further analysis reveals that our patch retriever can search for relevant fix patterns to guide the APR systems. Third, we focus on a novel task of low-resource APR. Recent advances in deep learning (DL) based models have demonstrated promising results by learning from large-scale bug-fix examples in a data-driven manner. However, in practical scenarios, software bugs have an imbalanced distribution, and the fixing knowledge learned by APR models often only captures the patterns of frequent error types, making it inapplicable to handle the rare error types. To address this limitation, we propose Meta-APR, a new meta-learning framework integrated with code pretrained language models to generate fixes for low-resource bugs with limited training samples. Extensive experimental results on three benchmarks in various programming languages verify the superiority of our method over existing DL-based APR approaches. Last but not least, we explore xCodeEval, the largest executable multilingual multitask benchmark to date consisting of 25 M document-level coding examples from about 7.5K unique problems covering up to 17 programming languages with execution-level parallelism. We propose a novel APR task to synthesize a fix for a detected program bug. Specifically, given a bug-specific defect, the objective of this task is to generate a correct fix that passes all the unit tests. Detailed experiments demonstrate that our proposed APR task offers a fresh perspective for examining and analyzing large language model (LLM)-based APR, facilitating comprehensive and to some extent highly interpretable investigations of their repair performance. This thesis strives for a robust neural code generation across multiple languages and tasks, facilitating real-world APR tasks to alleviate manual debugging efforts for everyone regardless of their coding background. Doctor of Philosophy 2024-03-06T07:35:12Z 2024-03-06T07:35:12Z 2024 Thesis-Doctor of Philosophy Wang, W. (2024). Neural code generation for robust automatic program repair. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173910 https://hdl.handle.net/10356/173910 10.32657/10356/173910 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |