Neural code generation for robust automatic program repair

Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. Consequently, it has gained increasing attention as an essential technique in software development to boost developers’ productivity. Conventional search-based techniques...

Full description

Saved in:

Bibliographic Details
Main Author:	Wang, Weishi
Other Authors:	Joty Shafiq Rayhan
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/173910
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-173910
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science
spellingShingle	Computer and Information Science Wang, Weishi Neural code generation for robust automatic program repair
description	Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. Consequently, it has gained increasing attention as an essential technique in software development to boost developers’ productivity. Conventional search-based techniques typically rely on heuristic rules or a redundancy assumption to mine fix patterns, which continuously generate code patches until the resultant program meets the pre-defined test specifications. However, these approaches often yield a substantial amount of low-quality patch candidates, leading to an ineffective APR system. To address these limitations, inspired by the surge of deep learning (DL) based approaches for natural language processing (NLP), we focus on robust APR methods in real-world scenarios via neural code generation. In this thesis, we leverage recent advances in deep learning and deploy novel transformer-based frameworks to automate the program repair process in a data-driven manner. This thesis presents numerous contributions toward the development of robust automatic program repair. First, we propose a novel code-aware, encoder-decoder-based pre-trained programming language model CodeT5 to support both code understanding and generation tasks. To be more specific, we employ a unified encoder-decoder transformer architecture T5 and incorporate code-specific knowledge for better code representation and understanding. Furthermore, we propose a novel identifier-aware pre-training task that enables the model to distinguish which code tokens are identifiers and to recover them when they are masked. Besides, we propose to exploit the user-written code comments with a bimodal dual-generation task for better natural language (NL)-programming language (PL) alignment. Comprehensive experiments show that CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL. Further analysis reveals that our model can better capture semantic information from code. Second, we investigate the effectiveness of leveraging bug-fix patterns for automatic program repair. We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) by explicitly leveraging relevant fix patterns retrieved from a codebase of previous bug-fix pairs. Specifically, we build a hybrid patch retriever to account for both lexical and semantic matching based on the raw source code in a language-agnostic manner, which does not rely on any code-specific features. In addition, we adapt our code-aware language model CodeT5 as the foundation model to facilitate both patch retrieval and generation tasks in a unified manner. Notably, RAP-Gen is a generic APR framework that can flexibly integrate different patch retrievers and generators to repair various types of bugs. We thoroughly evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java, where the bug localization information may or may not be provided. Experimental results show that RAP-Gen significantly outperforms previous state-of-the-art (SoTA) approaches on all benchmarks, e.g., boosting the accuracy of T5-large on TFix from 49.70% to 54.15% (repairing 478 more bugs) and repairing 15 more bugs on 818 Defects4J bugs. Further analysis reveals that our patch retriever can search for relevant fix patterns to guide the APR systems. Third, we focus on a novel task of low-resource APR. Recent advances in deep learning (DL) based models have demonstrated promising results by learning from large-scale bug-fix examples in a data-driven manner. However, in practical scenarios, software bugs have an imbalanced distribution, and the fixing knowledge learned by APR models often only captures the patterns of frequent error types, making it inapplicable to handle the rare error types. To address this limitation, we propose Meta-APR, a new meta-learning framework integrated with code pretrained language models to generate fixes for low-resource bugs with limited training samples. Extensive experimental results on three benchmarks in various programming languages verify the superiority of our method over existing DL-based APR approaches. Last but not least, we explore xCodeEval, the largest executable multilingual multitask benchmark to date consisting of 25 M document-level coding examples from about 7.5K unique problems covering up to 17 programming languages with execution-level parallelism. We propose a novel APR task to synthesize a fix for a detected program bug. Specifically, given a bug-specific defect, the objective of this task is to generate a correct fix that passes all the unit tests. Detailed experiments demonstrate that our proposed APR task offers a fresh perspective for examining and analyzing large language model (LLM)-based APR, facilitating comprehensive and to some extent highly interpretable investigations of their repair performance. This thesis strives for a robust neural code generation across multiple languages and tasks, facilitating real-world APR tasks to alleviate manual debugging efforts for everyone regardless of their coding background.
author2	Joty Shafiq Rayhan
author_facet	Joty Shafiq Rayhan Wang, Weishi
format	Thesis-Doctor of Philosophy
author	Wang, Weishi
author_sort	Wang, Weishi
title	Neural code generation for robust automatic program repair
title_short	Neural code generation for robust automatic program repair
title_full	Neural code generation for robust automatic program repair
title_fullStr	Neural code generation for robust automatic program repair
title_full_unstemmed	Neural code generation for robust automatic program repair
title_sort	neural code generation for robust automatic program repair
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/173910
_version_	1800916256977584128
spelling	sg-ntu-dr.10356-1739102024-04-09T03:58:57Z Neural code generation for robust automatic program repair Wang, Weishi Joty Shafiq Rayhan Luu Anh Tuan School of Computer Science and Engineering Salesforce Research Aisa Steven Hoi anhtuan.luu@ntu.edu.sg, srjoty@ntu.edu.sg Computer and Information Science Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. Consequently, it has gained increasing attention as an essential technique in software development to boost developers’ productivity. Conventional search-based techniques typically rely on heuristic rules or a redundancy assumption to mine fix patterns, which continuously generate code patches until the resultant program meets the pre-defined test specifications. However, these approaches often yield a substantial amount of low-quality patch candidates, leading to an ineffective APR system. To address these limitations, inspired by the surge of deep learning (DL) based approaches for natural language processing (NLP), we focus on robust APR methods in real-world scenarios via neural code generation. In this thesis, we leverage recent advances in deep learning and deploy novel transformer-based frameworks to automate the program repair process in a data-driven manner. This thesis presents numerous contributions toward the development of robust automatic program repair. First, we propose a novel code-aware, encoder-decoder-based pre-trained programming language model CodeT5 to support both code understanding and generation tasks. To be more specific, we employ a unified encoder-decoder transformer architecture T5 and incorporate code-specific knowledge for better code representation and understanding. Furthermore, we propose a novel identifier-aware pre-training task that enables the model to distinguish which code tokens are identifiers and to recover them when they are masked. Besides, we propose to exploit the user-written code comments with a bimodal dual-generation task for better natural language (NL)-programming language (PL) alignment. Comprehensive experiments show that CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL. Further analysis reveals that our model can better capture semantic information from code. Second, we investigate the effectiveness of leveraging bug-fix patterns for automatic program repair. We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) by explicitly leveraging relevant fix patterns retrieved from a codebase of previous bug-fix pairs. Specifically, we build a hybrid patch retriever to account for both lexical and semantic matching based on the raw source code in a language-agnostic manner, which does not rely on any code-specific features. In addition, we adapt our code-aware language model CodeT5 as the foundation model to facilitate both patch retrieval and generation tasks in a unified manner. Notably, RAP-Gen is a generic APR framework that can flexibly integrate different patch retrievers and generators to repair various types of bugs. We thoroughly evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java, where the bug localization information may or may not be provided. Experimental results show that RAP-Gen significantly outperforms previous state-of-the-art (SoTA) approaches on all benchmarks, e.g., boosting the accuracy of T5-large on TFix from 49.70% to 54.15% (repairing 478 more bugs) and repairing 15 more bugs on 818 Defects4J bugs. Further analysis reveals that our patch retriever can search for relevant fix patterns to guide the APR systems. Third, we focus on a novel task of low-resource APR. Recent advances in deep learning (DL) based models have demonstrated promising results by learning from large-scale bug-fix examples in a data-driven manner. However, in practical scenarios, software bugs have an imbalanced distribution, and the fixing knowledge learned by APR models often only captures the patterns of frequent error types, making it inapplicable to handle the rare error types. To address this limitation, we propose Meta-APR, a new meta-learning framework integrated with code pretrained language models to generate fixes for low-resource bugs with limited training samples. Extensive experimental results on three benchmarks in various programming languages verify the superiority of our method over existing DL-based APR approaches. Last but not least, we explore xCodeEval, the largest executable multilingual multitask benchmark to date consisting of 25 M document-level coding examples from about 7.5K unique problems covering up to 17 programming languages with execution-level parallelism. We propose a novel APR task to synthesize a fix for a detected program bug. Specifically, given a bug-specific defect, the objective of this task is to generate a correct fix that passes all the unit tests. Detailed experiments demonstrate that our proposed APR task offers a fresh perspective for examining and analyzing large language model (LLM)-based APR, facilitating comprehensive and to some extent highly interpretable investigations of their repair performance. This thesis strives for a robust neural code generation across multiple languages and tasks, facilitating real-world APR tasks to alleviate manual debugging efforts for everyone regardless of their coding background. Doctor of Philosophy 2024-03-06T07:35:12Z 2024-03-06T07:35:12Z 2024 Thesis-Doctor of Philosophy Wang, W. (2024). Neural code generation for robust automatic program repair. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173910 https://hdl.handle.net/10356/173910 10.32657/10356/173910 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Neural code generation for robust automatic program repair

Similar Items