Retrieval-augmented source code summarization
The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstruct...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/165901 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstructured natural language summaries. Traditional approaches rely on rule-based or retrieval-based methods, but they have low generalization capability, while deep learning models have not taken advantage of similar candidates in datasets. Recently, retrieval-augmented deep learning approaches have been proposed to combine the strengths of retrieval-based methods and deep learning models. However, these approaches require additional training to integrate the retrieved information with the input. In this thesis, we propose a retrieval-augmented method that does not require any extra training. We implement a new baseline model for CodeXGLUE code summarization tasks using GraphCodeBERT. Our method improves the baseline's BLEU-4 and perplexity score by 0.6 and 6.7, respectively. |
---|