Neural-machine-translation-based commit message generation: how far are we?

Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected...

Full description

Saved in:
Bibliographic Details
Main Authors: LIU, Zhongxin, XIA, Xin, HASSAN, Ahmed E., LO, David, XING, Zhenchang, WANG, Xinyu
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4296
https://ink.library.smu.edu.sg/context/sis_research/article/5299/viewcontent/ase181.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5299
record_format dspace
spelling sg-smu-ink.sis_research-52992019-02-21T08:36:57Z Neural-machine-translation-based commit message generation: how far are we? LIU, Zhongxin XIA, Xin HASSAN, Ahmed E. LO, David XING, Zhenchang WANG, Xinyu Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test diffs from which NMT can generate high-quality messages are similar to one or more training diffs at the token level. (2) About 16% of the commit messages in Jiang et al.’s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers. 2018-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4296 info:doi/10.1145/3238147.3238190 https://ink.library.smu.edu.sg/context/sis_research/article/5299/viewcontent/ase181.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Commit message generation Nearest neighbor algorithm Neural machine translation Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Commit message generation
Nearest neighbor algorithm
Neural machine translation
Software Engineering
spellingShingle Commit message generation
Nearest neighbor algorithm
Neural machine translation
Software Engineering
LIU, Zhongxin
XIA, Xin
HASSAN, Ahmed E.
LO, David
XING, Zhenchang
WANG, Xinyu
Neural-machine-translation-based commit message generation: how far are we?
description Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test diffs from which NMT can generate high-quality messages are similar to one or more training diffs at the token level. (2) About 16% of the commit messages in Jiang et al.’s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers.
format text
author LIU, Zhongxin
XIA, Xin
HASSAN, Ahmed E.
LO, David
XING, Zhenchang
WANG, Xinyu
author_facet LIU, Zhongxin
XIA, Xin
HASSAN, Ahmed E.
LO, David
XING, Zhenchang
WANG, Xinyu
author_sort LIU, Zhongxin
title Neural-machine-translation-based commit message generation: how far are we?
title_short Neural-machine-translation-based commit message generation: how far are we?
title_full Neural-machine-translation-based commit message generation: how far are we?
title_fullStr Neural-machine-translation-based commit message generation: how far are we?
title_full_unstemmed Neural-machine-translation-based commit message generation: how far are we?
title_sort neural-machine-translation-based commit message generation: how far are we?
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/4296
https://ink.library.smu.edu.sg/context/sis_research/article/5299/viewcontent/ase181.pdf
_version_ 1770574603057364992