Retrieval-augmented source code summarization

The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstruct...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Jia Qing
Other Authors:	Liu Yang
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering::Software::Programming languages Engineering::Computer science and engineering::Software::Software engineering
Online Access:	https://hdl.handle.net/10356/165901
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-165901
record_format	dspace
spelling	sg-ntu-dr.10356-1659012023-04-21T15:36:50Z Retrieval-augmented source code summarization Tan, Jia Qing Liu Yang School of Computer Science and Engineering Cyber Security Lab (CSL) yangliu@ntu.edu.sg Engineering::Computer science and engineering::Software::Programming languages Engineering::Computer science and engineering::Software::Software engineering The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstructured natural language summaries. Traditional approaches rely on rule-based or retrieval-based methods, but they have low generalization capability, while deep learning models have not taken advantage of similar candidates in datasets. Recently, retrieval-augmented deep learning approaches have been proposed to combine the strengths of retrieval-based methods and deep learning models. However, these approaches require additional training to integrate the retrieved information with the input. In this thesis, we propose a retrieval-augmented method that does not require any extra training. We implement a new baseline model for CodeXGLUE code summarization tasks using GraphCodeBERT. Our method improves the baseline's BLEU-4 and perplexity score by 0.6 and 6.7, respectively. Bachelor of Engineering (Computer Science) 2023-04-16T04:38:18Z 2023-04-16T04:38:18Z 2023 Final Year Project (FYP) Tan, J. Q. (2023). Retrieval-augmented source code summarization. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165901 https://hdl.handle.net/10356/165901 en SCSE22-0579 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Software::Programming languages Engineering::Computer science and engineering::Software::Software engineering
spellingShingle	Engineering::Computer science and engineering::Software::Programming languages Engineering::Computer science and engineering::Software::Software engineering Tan, Jia Qing Retrieval-augmented source code summarization
description	The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstructured natural language summaries. Traditional approaches rely on rule-based or retrieval-based methods, but they have low generalization capability, while deep learning models have not taken advantage of similar candidates in datasets. Recently, retrieval-augmented deep learning approaches have been proposed to combine the strengths of retrieval-based methods and deep learning models. However, these approaches require additional training to integrate the retrieved information with the input. In this thesis, we propose a retrieval-augmented method that does not require any extra training. We implement a new baseline model for CodeXGLUE code summarization tasks using GraphCodeBERT. Our method improves the baseline's BLEU-4 and perplexity score by 0.6 and 6.7, respectively.
author2	Liu Yang
author_facet	Liu Yang Tan, Jia Qing
format	Final Year Project
author	Tan, Jia Qing
author_sort	Tan, Jia Qing
title	Retrieval-augmented source code summarization
title_short	Retrieval-augmented source code summarization
title_full	Retrieval-augmented source code summarization
title_fullStr	Retrieval-augmented source code summarization
title_full_unstemmed	Retrieval-augmented source code summarization
title_sort	retrieval-augmented source code summarization
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/165901
_version_	1764208072054538240

Retrieval-augmented source code summarization

Similar Items