Retrieval-augmented generation for code summarization via hybrid GNN

Source code summarization aims to generate natural language summaries from structured code snippets for better understanding code functionalities. However, automatic code summarization is challenging due to the complexity of the source code and the language gap between the source code and natural la...

Full description

Saved in:
Bibliographic Details
Main Authors: LIU, Shangqing, CHEN, Yu, XIE, Xiaofei, SIOW, Jingkai, LIU, Yang
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7090
https://ink.library.smu.edu.sg/context/sis_research/article/8093/viewcontent/2006.05405.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8093
record_format dspace
spelling sg-smu-ink.sis_research-80932022-04-07T07:35:13Z Retrieval-augmented generation for code summarization via hybrid GNN LIU, Shangqing CHEN, Yu XIE, Xiaofei SIOW, Jingkai LIU, Yang Source code summarization aims to generate natural language summaries from structured code snippets for better understanding code functionalities. However, automatic code summarization is challenging due to the complexity of the source code and the language gap between the source code and natural language summaries. Most previous approaches either rely on retrieval-based (which can take advantage of similar examples seen from the retrieval database, but have low generalization performance) or generation-based methods (which have better generalization performance, but cannot take advantage of similar examples). This paper proposes a novel retrieval-augmented mechanism to combine the benefits of both worlds. Furthermore, to mitigate the limitation of Graph Neural Networks (GNNs) on capturing global graph structure information of source code, we propose a novel attention-based dynamic graph to complement the static graph representation of the source code, and design a hybrid message passing GNN for capturing both the local and global structural information. To evaluate the proposed approach, we release a new challenging benchmark, crawled from diversified large-scale open-source C projects (total 95k+ unique functions in the dataset). Our method achieves the state-of-the-art performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of BLEU-4, ROUGE-L and METEOR. 2021-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7090 https://ink.library.smu.edu.sg/context/sis_research/article/8093/viewcontent/2006.05405.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Programming Languages and Compilers Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Programming Languages and Compilers
Software Engineering
spellingShingle Programming Languages and Compilers
Software Engineering
LIU, Shangqing
CHEN, Yu
XIE, Xiaofei
SIOW, Jingkai
LIU, Yang
Retrieval-augmented generation for code summarization via hybrid GNN
description Source code summarization aims to generate natural language summaries from structured code snippets for better understanding code functionalities. However, automatic code summarization is challenging due to the complexity of the source code and the language gap between the source code and natural language summaries. Most previous approaches either rely on retrieval-based (which can take advantage of similar examples seen from the retrieval database, but have low generalization performance) or generation-based methods (which have better generalization performance, but cannot take advantage of similar examples). This paper proposes a novel retrieval-augmented mechanism to combine the benefits of both worlds. Furthermore, to mitigate the limitation of Graph Neural Networks (GNNs) on capturing global graph structure information of source code, we propose a novel attention-based dynamic graph to complement the static graph representation of the source code, and design a hybrid message passing GNN for capturing both the local and global structural information. To evaluate the proposed approach, we release a new challenging benchmark, crawled from diversified large-scale open-source C projects (total 95k+ unique functions in the dataset). Our method achieves the state-of-the-art performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of BLEU-4, ROUGE-L and METEOR.
format text
author LIU, Shangqing
CHEN, Yu
XIE, Xiaofei
SIOW, Jingkai
LIU, Yang
author_facet LIU, Shangqing
CHEN, Yu
XIE, Xiaofei
SIOW, Jingkai
LIU, Yang
author_sort LIU, Shangqing
title Retrieval-augmented generation for code summarization via hybrid GNN
title_short Retrieval-augmented generation for code summarization via hybrid GNN
title_full Retrieval-augmented generation for code summarization via hybrid GNN
title_fullStr Retrieval-augmented generation for code summarization via hybrid GNN
title_full_unstemmed Retrieval-augmented generation for code summarization via hybrid GNN
title_sort retrieval-augmented generation for code summarization via hybrid gnn
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/7090
https://ink.library.smu.edu.sg/context/sis_research/article/8093/viewcontent/2006.05405.pdf
_version_ 1770576210246500352