Retrieval-augmented source code summarization

The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstruct...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Jia Qing
Other Authors: Liu Yang
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165901
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-165901
record_format dspace
spelling sg-ntu-dr.10356-1659012023-04-21T15:36:50Z Retrieval-augmented source code summarization Tan, Jia Qing Liu Yang School of Computer Science and Engineering Cyber Security Lab (CSL) yangliu@ntu.edu.sg Engineering::Computer science and engineering::Software::Programming languages Engineering::Computer science and engineering::Software::Software engineering The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstructured natural language summaries. Traditional approaches rely on rule-based or retrieval-based methods, but they have low generalization capability, while deep learning models have not taken advantage of similar candidates in datasets. Recently, retrieval-augmented deep learning approaches have been proposed to combine the strengths of retrieval-based methods and deep learning models. However, these approaches require additional training to integrate the retrieved information with the input. In this thesis, we propose a retrieval-augmented method that does not require any extra training. We implement a new baseline model for CodeXGLUE code summarization tasks using GraphCodeBERT. Our method improves the baseline's BLEU-4 and perplexity score by 0.6 and 6.7, respectively. Bachelor of Engineering (Computer Science) 2023-04-16T04:38:18Z 2023-04-16T04:38:18Z 2023 Final Year Project (FYP) Tan, J. Q. (2023). Retrieval-augmented source code summarization. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165901 https://hdl.handle.net/10356/165901 en SCSE22-0579 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Software::Programming languages
Engineering::Computer science and engineering::Software::Software engineering
spellingShingle Engineering::Computer science and engineering::Software::Programming languages
Engineering::Computer science and engineering::Software::Software engineering
Tan, Jia Qing
Retrieval-augmented source code summarization
description The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstructured natural language summaries. Traditional approaches rely on rule-based or retrieval-based methods, but they have low generalization capability, while deep learning models have not taken advantage of similar candidates in datasets. Recently, retrieval-augmented deep learning approaches have been proposed to combine the strengths of retrieval-based methods and deep learning models. However, these approaches require additional training to integrate the retrieved information with the input. In this thesis, we propose a retrieval-augmented method that does not require any extra training. We implement a new baseline model for CodeXGLUE code summarization tasks using GraphCodeBERT. Our method improves the baseline's BLEU-4 and perplexity score by 0.6 and 6.7, respectively.
author2 Liu Yang
author_facet Liu Yang
Tan, Jia Qing
format Final Year Project
author Tan, Jia Qing
author_sort Tan, Jia Qing
title Retrieval-augmented source code summarization
title_short Retrieval-augmented source code summarization
title_full Retrieval-augmented source code summarization
title_fullStr Retrieval-augmented source code summarization
title_full_unstemmed Retrieval-augmented source code summarization
title_sort retrieval-augmented source code summarization
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/165901
_version_ 1764208072054538240