Why my code summarization model does not Work: Code comment improvement with category prediction

Code summarization aims at generating a code comment given a block of source code and it is normally performed by training machine learning algorithms on existing code block-comment pairs. Code comments in practice have different intentions. For example, some code comments might explain how the meth...

Full description

Saved in:

Bibliographic Details
Main Authors:	CHEN, Qiuyuan, XIA, Xin, HU, Han, LO, David, LI, Shanping
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Code summarization code comment comment classification Databases and Information Systems Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/6706 https://ink.library.smu.edu.sg/context/sis_research/article/7709/viewcontent/TOSEM_Qiuyuan_Chen_2021_Why_My_Code_Summarization_Model_Does_Not_Work.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7709
record_format	dspace
spelling	sg-smu-ink.sis_research-77092022-01-27T11:17:59Z Why my code summarization model does not Work: Code comment improvement with category prediction CHEN, Qiuyuan XIA, Xin HU, Han LO, David LI, Shanping Code summarization aims at generating a code comment given a block of source code and it is normally performed by training machine learning algorithms on existing code block-comment pairs. Code comments in practice have different intentions. For example, some code comments might explain how the methods work, while others explain why some methods are written. Previous works have shown that a relationship exists between a code block and the category of a comment associated with it. In this article, we aim to investigate to which extent we can exploit this relationship to improve code summarization performance. We first classify comments into six intention categories and manually label 20,000 code-comment pairs. These categories include “what,” “why,” “how-to-use,” “how-it-is-done,” “property,” and “others.” Based on this dataset, we conduct an experiment to investigate the performance of different state-of-the-art code summarization approaches on the categories. We find that the performance of different code summarization approaches varies substantially across the categories. Moreover, the category for which a code summarization model performs the best is different for the different models. In particular, no models perform the best for “why” and “property” comments among the six categories. We design a composite approach to demonstrate that comment category prediction can boost code summarization to reach better results. The approach leverages classified code-category labeled data to train a classifier to infer categories. Then it selects the most suitable models for inferred categories and outputs the composite results. Our composite approach outperforms other approaches that do not consider comment categories and obtains a relative improvement of 8.57% and 16.34% in terms of ROUGE-L and BLEU-4 score, respectively. 2021-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6706 info:doi/10.1145/3434280 https://ink.library.smu.edu.sg/context/sis_research/article/7709/viewcontent/TOSEM_Qiuyuan_Chen_2021_Why_My_Code_Summarization_Model_Does_Not_Work.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Code summarization code comment comment classification Databases and Information Systems Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Code summarization code comment comment classification Databases and Information Systems Software Engineering
spellingShingle	Code summarization code comment comment classification Databases and Information Systems Software Engineering CHEN, Qiuyuan XIA, Xin HU, Han LO, David LI, Shanping Why my code summarization model does not Work: Code comment improvement with category prediction
description	Code summarization aims at generating a code comment given a block of source code and it is normally performed by training machine learning algorithms on existing code block-comment pairs. Code comments in practice have different intentions. For example, some code comments might explain how the methods work, while others explain why some methods are written. Previous works have shown that a relationship exists between a code block and the category of a comment associated with it. In this article, we aim to investigate to which extent we can exploit this relationship to improve code summarization performance. We first classify comments into six intention categories and manually label 20,000 code-comment pairs. These categories include “what,” “why,” “how-to-use,” “how-it-is-done,” “property,” and “others.” Based on this dataset, we conduct an experiment to investigate the performance of different state-of-the-art code summarization approaches on the categories. We find that the performance of different code summarization approaches varies substantially across the categories. Moreover, the category for which a code summarization model performs the best is different for the different models. In particular, no models perform the best for “why” and “property” comments among the six categories. We design a composite approach to demonstrate that comment category prediction can boost code summarization to reach better results. The approach leverages classified code-category labeled data to train a classifier to infer categories. Then it selects the most suitable models for inferred categories and outputs the composite results. Our composite approach outperforms other approaches that do not consider comment categories and obtains a relative improvement of 8.57% and 16.34% in terms of ROUGE-L and BLEU-4 score, respectively.
format	text
author	CHEN, Qiuyuan XIA, Xin HU, Han LO, David LI, Shanping
author_facet	CHEN, Qiuyuan XIA, Xin HU, Han LO, David LI, Shanping
author_sort	CHEN, Qiuyuan
title	Why my code summarization model does not Work: Code comment improvement with category prediction
title_short	Why my code summarization model does not Work: Code comment improvement with category prediction
title_full	Why my code summarization model does not Work: Code comment improvement with category prediction
title_fullStr	Why my code summarization model does not Work: Code comment improvement with category prediction
title_full_unstemmed	Why my code summarization model does not Work: Code comment improvement with category prediction
title_sort	why my code summarization model does not work: code comment improvement with category prediction
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/6706 https://ink.library.smu.edu.sg/context/sis_research/article/7709/viewcontent/TOSEM_Qiuyuan_Chen_2021_Why_My_Code_Summarization_Model_Does_Not_Work.pdf
_version_	1770576051095732224

Why my code summarization model does not Work: Code comment improvement with category prediction

Similar Items