Sequence-to-sequence learning for automated software artifact generation

During the development and maintenance of a software system, developers produce many digital artifacts besides source code, e.g., requirement documents, code comments, change history, bug reports, etc. Such artifacts are valuable for developers to understand and maintain the software system. However...

Full description

Saved in:

Bibliographic Details
Main Authors:	LIU, Zhongxin, XIA, Xin, LO, David
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Artificial Intelligence and Robotics Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/7257 https://ink.library.smu.edu.sg/context/sis_research/article/8260/viewcontent/9781509922802.ch_001_pvoa.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8260
record_format	dspace
spelling	sg-smu-ink.sis_research-82602022-09-12T10:10:49Z Sequence-to-sequence learning for automated software artifact generation LIU, Zhongxin XIA, Xin LO, David During the development and maintenance of a software system, developers produce many digital artifacts besides source code, e.g., requirement documents, code comments, change history, bug reports, etc. Such artifacts are valuable for developers to understand and maintain the software system. However, creating software artifacts can be burdensome and developers sometimes neglect to write and maintain important artifacts. This problem can be alleviated by software artifact generation tools, which can assist developers in creating software artifacts and automatically generate artifacts to replace existing empty ones. The focus of this chapter is automated software artifact generation (hereon, SAG) using seq2seq learning. This research direction is inspired by the similarities between natural language generation (NLG) and SAG and the effectiveness of seq2seq models on NLG tasks. When applied to SAG, seq2seq models are able to automatically learn generation patterns from massive software artifact data and adaptively adopt such learned patterns for generation. Compared to template-based and IR-based techniques, seq2seq-model-based approaches do not require expensive manual efforts to summarize and implement templates or rules, are not limited to term-based summaries, are able to produce novel expressions and can be more general. In addition, seq2seq learning is developing rapidly and there are more and more publicly available software artifacts on the Internet, which make seq2seq-model-based SAG a timely and promising research direction. This chapter aims to provide a comprehensive introduction to this research direction, i.e., seq2seq-model-based SAG. Specifically, we first introduce the preliminary knowledge of seq2seq models, including the RNN, the encoder-decoder model, the attention mechanism, and some commonly-used evaluation metrics for SAG (Sec. 5.2). Next, three case studies, i.e., code comment generation, pull request description generation, and app review response generation, are presented to illustrate how to build SE-task-specific parallel corpora for seq2seq models and how to customize seq2seq models in a SE-task-specific way (Secs. 5.3–5.5). 2021-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7257 info:doi/10.1142/9789811239922_0005 https://ink.library.smu.edu.sg/context/sis_research/article/8260/viewcontent/9781509922802.ch_001_pvoa.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Artificial Intelligence and Robotics Software Engineering
spellingShingle	Artificial Intelligence and Robotics Software Engineering LIU, Zhongxin XIA, Xin LO, David Sequence-to-sequence learning for automated software artifact generation
description	During the development and maintenance of a software system, developers produce many digital artifacts besides source code, e.g., requirement documents, code comments, change history, bug reports, etc. Such artifacts are valuable for developers to understand and maintain the software system. However, creating software artifacts can be burdensome and developers sometimes neglect to write and maintain important artifacts. This problem can be alleviated by software artifact generation tools, which can assist developers in creating software artifacts and automatically generate artifacts to replace existing empty ones. The focus of this chapter is automated software artifact generation (hereon, SAG) using seq2seq learning. This research direction is inspired by the similarities between natural language generation (NLG) and SAG and the effectiveness of seq2seq models on NLG tasks. When applied to SAG, seq2seq models are able to automatically learn generation patterns from massive software artifact data and adaptively adopt such learned patterns for generation. Compared to template-based and IR-based techniques, seq2seq-model-based approaches do not require expensive manual efforts to summarize and implement templates or rules, are not limited to term-based summaries, are able to produce novel expressions and can be more general. In addition, seq2seq learning is developing rapidly and there are more and more publicly available software artifacts on the Internet, which make seq2seq-model-based SAG a timely and promising research direction. This chapter aims to provide a comprehensive introduction to this research direction, i.e., seq2seq-model-based SAG. Specifically, we first introduce the preliminary knowledge of seq2seq models, including the RNN, the encoder-decoder model, the attention mechanism, and some commonly-used evaluation metrics for SAG (Sec. 5.2). Next, three case studies, i.e., code comment generation, pull request description generation, and app review response generation, are presented to illustrate how to build SE-task-specific parallel corpora for seq2seq models and how to customize seq2seq models in a SE-task-specific way (Secs. 5.3–5.5).
format	text
author	LIU, Zhongxin XIA, Xin LO, David
author_facet	LIU, Zhongxin XIA, Xin LO, David
author_sort	LIU, Zhongxin
title	Sequence-to-sequence learning for automated software artifact generation
title_short	Sequence-to-sequence learning for automated software artifact generation
title_full	Sequence-to-sequence learning for automated software artifact generation
title_fullStr	Sequence-to-sequence learning for automated software artifact generation
title_full_unstemmed	Sequence-to-sequence learning for automated software artifact generation
title_sort	sequence-to-sequence learning for automated software artifact generation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/7257 https://ink.library.smu.edu.sg/context/sis_research/article/8260/viewcontent/9781509922802.ch_001_pvoa.pdf
_version_	1770576292641505280

Sequence-to-sequence learning for automated software artifact generation

Similar Items