Retrieval-augmented source code summarization

The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstruct...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Jia Qing
Other Authors: Liu Yang
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165901
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The goal of automatic source code summarization is to create brief descriptions using natural language, which are based on code snippets, in order to improve the workflow of software development. This task is challenging because of the difficulty in matching highly structured source code to unstructured natural language summaries. Traditional approaches rely on rule-based or retrieval-based methods, but they have low generalization capability, while deep learning models have not taken advantage of similar candidates in datasets. Recently, retrieval-augmented deep learning approaches have been proposed to combine the strengths of retrieval-based methods and deep learning models. However, these approaches require additional training to integrate the retrieved information with the input. In this thesis, we propose a retrieval-augmented method that does not require any extra training. We implement a new baseline model for CodeXGLUE code summarization tasks using GraphCodeBERT. Our method improves the baseline's BLEU-4 and perplexity score by 0.6 and 6.7, respectively.