Is the ground truth really accurate? Dataset purification for automated program repair

Datasets of real-world bugs shipped with human-written patches are intensively used in the evaluation of existing automated program repair (APR) techniques, wherein the human-written patches always serve as the ground truth, for manual or automated assessment approaches, to evaluate the correctness...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	YANG, Deheng, LEI, Yan, MAO, Xiaoguang, LO, David, XIE, Huan, YAN, Meng
التنسيق:	text
اللغة:	English
منشور في:	Institutional Knowledge at Singapore Management University 2021
الموضوعات:	bug dataset automated program repair dataset purification Artificial Intelligence and Robotics Databases and Information Systems
الوصول للمادة أونلاين:	https://ink.library.smu.edu.sg/sis_research/6878 https://ink.library.smu.edu.sg/context/sis_research/article/7881/viewcontent/963000a096.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	sg-smu-ink.sis_research-7881
record_format	dspace
spelling	sg-smu-ink.sis_research-78812022-02-07T11:05:46Z Is the ground truth really accurate? Dataset purification for automated program repair YANG, Deheng LEI, Yan MAO, Xiaoguang LO, David XIE, Huan YAN, Meng Datasets of real-world bugs shipped with human-written patches are intensively used in the evaluation of existing automated program repair (APR) techniques, wherein the human-written patches always serve as the ground truth, for manual or automated assessment approaches, to evaluate the correctness of test-suite adequate patches. An inaccurate human-written patch tangled with other code changes will pose threats to the reliability of the assessment results. Therefore, the construction of such datasets always requires much manual effort on isolating real bug fixes from bug fixing commits. However, the manual work is time-consuming and prone to mistakes, and little has been known on whether the ground truth in such datasets is really accurate.In this paper, we propose DEPTEST, an automated DatasEt Purification technique from the perspective of triggering Tests. Leveraging coverage analysis and delta debugging, DEPTEST can automatically identify and filter out the code changes irrelevant to the bug exposed by triggering tests. To measure the strength of DEPTEST, we run it on the most extensively used dataset (i.e., Defects4J) that claims to already exclude all irrelevant code changes for each bug fix via manual purification. Our experiment indicates that even in a dataset where the bug fix is claimed to be well isolated, 41.01% of human-written patches can be further reduced by 4.3 lines on average, with the largest reduction reaching up to 53 lines. This indicates its great potential in assisting in the construction of datasets of accurate bug fixes. Furthermore, based on the purified patches, we re-dissect Defects4J and systematically revisit the APR of multi-chunk bugs to provide insights for future research targeting such bugs. 2021-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6878 info:doi/10.1109/SANER50967.2021.00018 https://ink.library.smu.edu.sg/context/sis_research/article/7881/viewcontent/963000a096.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University bug dataset automated program repair dataset purification Artificial Intelligence and Robotics Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	bug dataset automated program repair dataset purification Artificial Intelligence and Robotics Databases and Information Systems
spellingShingle	bug dataset automated program repair dataset purification Artificial Intelligence and Robotics Databases and Information Systems YANG, Deheng LEI, Yan MAO, Xiaoguang LO, David XIE, Huan YAN, Meng Is the ground truth really accurate? Dataset purification for automated program repair
description	Datasets of real-world bugs shipped with human-written patches are intensively used in the evaluation of existing automated program repair (APR) techniques, wherein the human-written patches always serve as the ground truth, for manual or automated assessment approaches, to evaluate the correctness of test-suite adequate patches. An inaccurate human-written patch tangled with other code changes will pose threats to the reliability of the assessment results. Therefore, the construction of such datasets always requires much manual effort on isolating real bug fixes from bug fixing commits. However, the manual work is time-consuming and prone to mistakes, and little has been known on whether the ground truth in such datasets is really accurate.In this paper, we propose DEPTEST, an automated DatasEt Purification technique from the perspective of triggering Tests. Leveraging coverage analysis and delta debugging, DEPTEST can automatically identify and filter out the code changes irrelevant to the bug exposed by triggering tests. To measure the strength of DEPTEST, we run it on the most extensively used dataset (i.e., Defects4J) that claims to already exclude all irrelevant code changes for each bug fix via manual purification. Our experiment indicates that even in a dataset where the bug fix is claimed to be well isolated, 41.01% of human-written patches can be further reduced by 4.3 lines on average, with the largest reduction reaching up to 53 lines. This indicates its great potential in assisting in the construction of datasets of accurate bug fixes. Furthermore, based on the purified patches, we re-dissect Defects4J and systematically revisit the APR of multi-chunk bugs to provide insights for future research targeting such bugs.
format	text
author	YANG, Deheng LEI, Yan MAO, Xiaoguang LO, David XIE, Huan YAN, Meng
author_facet	YANG, Deheng LEI, Yan MAO, Xiaoguang LO, David XIE, Huan YAN, Meng
author_sort	YANG, Deheng
title	Is the ground truth really accurate? Dataset purification for automated program repair
title_short	Is the ground truth really accurate? Dataset purification for automated program repair
title_full	Is the ground truth really accurate? Dataset purification for automated program repair
title_fullStr	Is the ground truth really accurate? Dataset purification for automated program repair
title_full_unstemmed	Is the ground truth really accurate? Dataset purification for automated program repair
title_sort	is the ground truth really accurate? dataset purification for automated program repair
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/6878 https://ink.library.smu.edu.sg/context/sis_research/article/7881/viewcontent/963000a096.pdf
_version_	1770576111920480256

Is the ground truth really accurate? Dataset purification for automated program repair

مواد مشابهة