Retrieval-augmented source code summarization
The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstruct...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/165901 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-165901 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1659012023-04-21T15:36:50Z Retrieval-augmented source code summarization Tan, Jia Qing Liu Yang School of Computer Science and Engineering Cyber Security Lab (CSL) yangliu@ntu.edu.sg Engineering::Computer science and engineering::Software::Programming languages Engineering::Computer science and engineering::Software::Software engineering The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstructured natural language summaries. Traditional approaches rely on rule-based or retrieval-based methods, but they have low generalization capability, while deep learning models have not taken advantage of similar candidates in datasets. Recently, retrieval-augmented deep learning approaches have been proposed to combine the strengths of retrieval-based methods and deep learning models. However, these approaches require additional training to integrate the retrieved information with the input. In this thesis, we propose a retrieval-augmented method that does not require any extra training. We implement a new baseline model for CodeXGLUE code summarization tasks using GraphCodeBERT. Our method improves the baseline's BLEU-4 and perplexity score by 0.6 and 6.7, respectively. Bachelor of Engineering (Computer Science) 2023-04-16T04:38:18Z 2023-04-16T04:38:18Z 2023 Final Year Project (FYP) Tan, J. Q. (2023). Retrieval-augmented source code summarization. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165901 https://hdl.handle.net/10356/165901 en SCSE22-0579 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Software::Programming languages Engineering::Computer science and engineering::Software::Software engineering |
spellingShingle |
Engineering::Computer science and engineering::Software::Programming languages Engineering::Computer science and engineering::Software::Software engineering Tan, Jia Qing Retrieval-augmented source code summarization |
description |
The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstructured natural language summaries. Traditional approaches rely on rule-based or retrieval-based methods, but they have low generalization capability, while deep learning models have not taken advantage of similar candidates in datasets. Recently, retrieval-augmented deep learning approaches have been proposed to combine the strengths of retrieval-based methods and deep learning models. However, these approaches require additional training to integrate the retrieved information with the input. In this thesis, we propose a retrieval-augmented method that does not require any extra training. We implement a new baseline model for CodeXGLUE code summarization tasks using GraphCodeBERT. Our method improves the baseline's BLEU-4 and perplexity score by 0.6 and 6.7, respectively. |
author2 |
Liu Yang |
author_facet |
Liu Yang Tan, Jia Qing |
format |
Final Year Project |
author |
Tan, Jia Qing |
author_sort |
Tan, Jia Qing |
title |
Retrieval-augmented source code summarization |
title_short |
Retrieval-augmented source code summarization |
title_full |
Retrieval-augmented source code summarization |
title_fullStr |
Retrieval-augmented source code summarization |
title_full_unstemmed |
Retrieval-augmented source code summarization |
title_sort |
retrieval-augmented source code summarization |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/165901 |
_version_ |
1764208072054538240 |