Context-aware retrieval-based deep commit message Generation

Commit messages recorded in version control systems contain valuable information for software development, maintenance, and comprehension. Unfortunately, developers often commit code with empty or poor quality commit messages. To address this issue, several studies have proposed approaches to genera...

Full description

Saved in:

Bibliographic Details
Main Authors:	WANG, Haoye, XIA, Xin, LO, David, HE, Qiang, WANG, Xinyu, GRUNDY, John
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/6776 https://ink.library.smu.edu.sg/context/sis_research/article/7779/viewcontent/tosem212.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7779
record_format	dspace
spelling	sg-smu-ink.sis_research-77792022-01-27T10:05:20Z Context-aware retrieval-based deep commit message Generation WANG, Haoye XIA, Xin LO, David HE, Qiang WANG, Xinyu GRUNDY, John Commit messages recorded in version control systems contain valuable information for software development, maintenance, and comprehension. Unfortunately, developers often commit code with empty or poor quality commit messages. To address this issue, several studies have proposed approaches to generate commit messages from commit diffs. Recent studies make use of neural machine translation algorithms to try and translate git diffs into commit messages and have achieved some promising results. However, these learning-based methods tend to generate high-frequency words but ignore low-frequency ones. In addition, they suffer from exposure bias issues, which leads to a gap between training phase and testing phase. In this paper, we propose CoRec to address the above two limitations. Specifically, we first train a contextaware encoder-decoder model which randomly selects the previous output of the decoder or the embedding vector of a ground truth word as context to make the model gradually aware of previous alignment choices. Given a diff for testing, the trained model is reused to retrieve the most similar diff from the training set. Finally, we use the retrieval diff to guide the probability distribution for the final generated vocabulary. Our method combines the advantages of both information retrieval and neural machine translation. We evaluate CoRec on a dataset from Liu et al. and a large-scale dataset crawled from 10k popular Java repositories in Github. Our experimental results show that CoRec significantly outperforms the state-of-the-art method NNGen by 19% on average in terms of BLEU. 2021-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6776 info:doi/10.1145/3464689 https://ink.library.smu.edu.sg/context/sis_research/article/7779/viewcontent/tosem212.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems
spellingShingle	Databases and Information Systems WANG, Haoye XIA, Xin LO, David HE, Qiang WANG, Xinyu GRUNDY, John Context-aware retrieval-based deep commit message Generation
description	Commit messages recorded in version control systems contain valuable information for software development, maintenance, and comprehension. Unfortunately, developers often commit code with empty or poor quality commit messages. To address this issue, several studies have proposed approaches to generate commit messages from commit diffs. Recent studies make use of neural machine translation algorithms to try and translate git diffs into commit messages and have achieved some promising results. However, these learning-based methods tend to generate high-frequency words but ignore low-frequency ones. In addition, they suffer from exposure bias issues, which leads to a gap between training phase and testing phase. In this paper, we propose CoRec to address the above two limitations. Specifically, we first train a contextaware encoder-decoder model which randomly selects the previous output of the decoder or the embedding vector of a ground truth word as context to make the model gradually aware of previous alignment choices. Given a diff for testing, the trained model is reused to retrieve the most similar diff from the training set. Finally, we use the retrieval diff to guide the probability distribution for the final generated vocabulary. Our method combines the advantages of both information retrieval and neural machine translation. We evaluate CoRec on a dataset from Liu et al. and a large-scale dataset crawled from 10k popular Java repositories in Github. Our experimental results show that CoRec significantly outperforms the state-of-the-art method NNGen by 19% on average in terms of BLEU.
format	text
author	WANG, Haoye XIA, Xin LO, David HE, Qiang WANG, Xinyu GRUNDY, John
author_facet	WANG, Haoye XIA, Xin LO, David HE, Qiang WANG, Xinyu GRUNDY, John
author_sort	WANG, Haoye
title	Context-aware retrieval-based deep commit message Generation
title_short	Context-aware retrieval-based deep commit message Generation
title_full	Context-aware retrieval-based deep commit message Generation
title_fullStr	Context-aware retrieval-based deep commit message Generation
title_full_unstemmed	Context-aware retrieval-based deep commit message Generation
title_sort	context-aware retrieval-based deep commit message generation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/6776 https://ink.library.smu.edu.sg/context/sis_research/article/7779/viewcontent/tosem212.pdf
_version_	1770576066855829504

Context-aware retrieval-based deep commit message Generation

Similar Items