On reliability of patch correctness assessment

Current state-of-the-art automatic software repair (ASR) techniques rely heavily on incomplete specifications, or test suites, to generate repairs. This, however, may cause ASR tools to generate repairs that are incorrect and hard to generalize. To assess patch correctness, researchers have been fol...

Full description

Saved in:

Bibliographic Details
Main Authors:	LE, Xuan-Bach D., BAO, Lingfeng, LO, David, XIA, Xin, LI, Shanping, PASAREANU, Corina S.
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2019
Subjects:	Automated program repair empirical study test case generation Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/4481 https://ink.library.smu.edu.sg/context/sis_research/article/5484/viewcontent/icse192.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5484
record_format	dspace
spelling	sg-smu-ink.sis_research-54842020-07-24T00:49:27Z On reliability of patch correctness assessment LE, Xuan-Bach D. BAO, Lingfeng LO, David XIA, Xin LI, Shanping PASAREANU, Corina S. Current state-of-the-art automatic software repair (ASR) techniques rely heavily on incomplete specifications, or test suites, to generate repairs. This, however, may cause ASR tools to generate repairs that are incorrect and hard to generalize. To assess patch correctness, researchers have been following two methods separately: (1) Automated annotation, wherein patches are automatically labeled by an independent test suite (ITS) – a patch passing the ITS is regarded as correct or generalizable, and incorrect otherwise, (2) Author annotation, wherein authors of ASR techniques manually annotate the correctness labels of patches generated by their and competing tools. While automated annotation cannot ascertain that a patch is actually correct, author annotation is prone to subjectivity. This concern has caused an on-going debate on the appropriate ways to assess the effectiveness of numerous ASR techniques proposed recently. In this work, we propose to assess reliability of author and automated annotations on patch correctness assessment. We do this by first constructing a gold set of correctness labels for 189 randomly selected patches generated by 8 state-of-the-art ASR techniques through a user study involving 35 professional developers as independent annotators. By measuring inter-rater agreement as a proxy for annotation quality – as commonly done in the literature – we demonstrate that our constructed gold set is on par with other high-quality gold sets. We then compare labels generated by author and automated annotations with this gold set to assess reliability of the patch assessment methodologies. We subsequently report several findings and highlight implications for future studies. 2019-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4481 info:doi/10.1109/ICSE.2019.00064 https://ink.library.smu.edu.sg/context/sis_research/article/5484/viewcontent/icse192.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Automated program repair empirical study test case generation Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Automated program repair empirical study test case generation Software Engineering
spellingShingle	Automated program repair empirical study test case generation Software Engineering LE, Xuan-Bach D. BAO, Lingfeng LO, David XIA, Xin LI, Shanping PASAREANU, Corina S. On reliability of patch correctness assessment
description	Current state-of-the-art automatic software repair (ASR) techniques rely heavily on incomplete specifications, or test suites, to generate repairs. This, however, may cause ASR tools to generate repairs that are incorrect and hard to generalize. To assess patch correctness, researchers have been following two methods separately: (1) Automated annotation, wherein patches are automatically labeled by an independent test suite (ITS) – a patch passing the ITS is regarded as correct or generalizable, and incorrect otherwise, (2) Author annotation, wherein authors of ASR techniques manually annotate the correctness labels of patches generated by their and competing tools. While automated annotation cannot ascertain that a patch is actually correct, author annotation is prone to subjectivity. This concern has caused an on-going debate on the appropriate ways to assess the effectiveness of numerous ASR techniques proposed recently. In this work, we propose to assess reliability of author and automated annotations on patch correctness assessment. We do this by first constructing a gold set of correctness labels for 189 randomly selected patches generated by 8 state-of-the-art ASR techniques through a user study involving 35 professional developers as independent annotators. By measuring inter-rater agreement as a proxy for annotation quality – as commonly done in the literature – we demonstrate that our constructed gold set is on par with other high-quality gold sets. We then compare labels generated by author and automated annotations with this gold set to assess reliability of the patch assessment methodologies. We subsequently report several findings and highlight implications for future studies.
format	text
author	LE, Xuan-Bach D. BAO, Lingfeng LO, David XIA, Xin LI, Shanping PASAREANU, Corina S.
author_facet	LE, Xuan-Bach D. BAO, Lingfeng LO, David XIA, Xin LI, Shanping PASAREANU, Corina S.
author_sort	LE, Xuan-Bach D.
title	On reliability of patch correctness assessment
title_short	On reliability of patch correctness assessment
title_full	On reliability of patch correctness assessment
title_fullStr	On reliability of patch correctness assessment
title_full_unstemmed	On reliability of patch correctness assessment
title_sort	on reliability of patch correctness assessment
publisher	Institutional Knowledge at Singapore Management University
publishDate	2019
url	https://ink.library.smu.edu.sg/sis_research/4481 https://ink.library.smu.edu.sg/context/sis_research/article/5484/viewcontent/icse192.pdf
_version_	1770574870862626816

On reliability of patch correctness assessment

Similar Items