When automated program repair meets regression testing - an extensive study on two million patches

In recent years, Automated Program Repair (APR) has been extensively studied in academia and even drawn wide attention from the industry. However, APR techniques can be extremely time consuming since (1) a large number of patches can be generated for a given bug, and (2) each patch needs to be execu...

Full description

Saved in:
Bibliographic Details
Main Authors: Lou, Yiling, Yang, Jun, Benton, Samuel, Hao, Dan, Tan, Lin, Chen, Zhenpeng, Zhang, Lu, Zhang, Lingming
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182504
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-182504
record_format dspace
spelling sg-ntu-dr.10356-1825042025-02-07T15:38:06Z When automated program repair meets regression testing - an extensive study on two million patches Lou, Yiling Yang, Jun Benton, Samuel Hao, Dan Tan, Lin Chen, Zhenpeng Zhang, Lu Zhang, Lingming School of Computer Science and Engineering Computer and Information Science Patch validation Program repair In recent years, Automated Program Repair (APR) has been extensively studied in academia and even drawn wide attention from the industry. However, APR techniques can be extremely time consuming since (1) a large number of patches can be generated for a given bug, and (2) each patch needs to be executed on the original tests to ensure its correctness. In the literature, various techniques (e.g., based on learning, mining, and constraint solving) have been proposed/studied to reduce the number of patches. Intuitively, every patch can be treated as a software revision during regression testing; thus, traditional Regression Test Selection (RTS) techniques can be leveraged to only execute the tests affected by each patch (as the other tests would keep the same outcomes) to further reduce patch execution time. However, few APR systems actually adopt RTS and there is still a lack of systematic studies demonstrating the benefits of RTS and the impact of different RTS strategies on APR. To this end, this article presents the first extensive study of widely used RTS techniques at different levels (i.e., class/method/statement levels) for 12 state-of-the-art APR systems on over 2M patches. Our study reveals various practical guidelines for bridging the gap between APR and regression testing, including: (1) the number of patches widely used for measuring APR efficiency can incur skewed conclusions, and the use of inconsistent RTS configurations can further skew the conclusions; (2) all studied RTS techniques can substantially improve APR efficiency and should be considered in future APR work; (3) method- and statement-level RTS outperform class-level RTS substantially and should be preferred; (4) RTS techniques can substantially outperform state-of-the-art test prioritization techniques for APR, and combining them can further improve APR efficiency; and (5) traditional Regression Test Prioritization (RTP) widely studied in regression testing performs even better than APR-specific test prioritization when combined with most RTS techniques. Furthermore, we also present the detailed impact of different patch categories and patch validation strategies on our findings. Published version 2025-02-05T04:13:04Z 2025-02-05T04:13:04Z 2024 Journal Article Lou, Y., Yang, J., Benton, S., Hao, D., Tan, L., Chen, Z., Zhang, L. & Zhang, L. (2024). When automated program repair meets regression testing - an extensive study on two million patches. ACM Transactions On Software Engineering and Methodology, 33(7), 180-. https://dx.doi.org/10.1145/3672450 1049-331X https://hdl.handle.net/10356/182504 10.1145/3672450 2-s2.0-85206219383 7 33 180 en ACM Transactions on Software Engineering and Methodology © 2024 the Owner/Author(s). This work is licensed under a Creative Commons Attribution International 4.0 License. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Patch validation
Program repair
spellingShingle Computer and Information Science
Patch validation
Program repair
Lou, Yiling
Yang, Jun
Benton, Samuel
Hao, Dan
Tan, Lin
Chen, Zhenpeng
Zhang, Lu
Zhang, Lingming
When automated program repair meets regression testing - an extensive study on two million patches
description In recent years, Automated Program Repair (APR) has been extensively studied in academia and even drawn wide attention from the industry. However, APR techniques can be extremely time consuming since (1) a large number of patches can be generated for a given bug, and (2) each patch needs to be executed on the original tests to ensure its correctness. In the literature, various techniques (e.g., based on learning, mining, and constraint solving) have been proposed/studied to reduce the number of patches. Intuitively, every patch can be treated as a software revision during regression testing; thus, traditional Regression Test Selection (RTS) techniques can be leveraged to only execute the tests affected by each patch (as the other tests would keep the same outcomes) to further reduce patch execution time. However, few APR systems actually adopt RTS and there is still a lack of systematic studies demonstrating the benefits of RTS and the impact of different RTS strategies on APR. To this end, this article presents the first extensive study of widely used RTS techniques at different levels (i.e., class/method/statement levels) for 12 state-of-the-art APR systems on over 2M patches. Our study reveals various practical guidelines for bridging the gap between APR and regression testing, including: (1) the number of patches widely used for measuring APR efficiency can incur skewed conclusions, and the use of inconsistent RTS configurations can further skew the conclusions; (2) all studied RTS techniques can substantially improve APR efficiency and should be considered in future APR work; (3) method- and statement-level RTS outperform class-level RTS substantially and should be preferred; (4) RTS techniques can substantially outperform state-of-the-art test prioritization techniques for APR, and combining them can further improve APR efficiency; and (5) traditional Regression Test Prioritization (RTP) widely studied in regression testing performs even better than APR-specific test prioritization when combined with most RTS techniques. Furthermore, we also present the detailed impact of different patch categories and patch validation strategies on our findings.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Lou, Yiling
Yang, Jun
Benton, Samuel
Hao, Dan
Tan, Lin
Chen, Zhenpeng
Zhang, Lu
Zhang, Lingming
format Article
author Lou, Yiling
Yang, Jun
Benton, Samuel
Hao, Dan
Tan, Lin
Chen, Zhenpeng
Zhang, Lu
Zhang, Lingming
author_sort Lou, Yiling
title When automated program repair meets regression testing - an extensive study on two million patches
title_short When automated program repair meets regression testing - an extensive study on two million patches
title_full When automated program repair meets regression testing - an extensive study on two million patches
title_fullStr When automated program repair meets regression testing - an extensive study on two million patches
title_full_unstemmed When automated program repair meets regression testing - an extensive study on two million patches
title_sort when automated program repair meets regression testing - an extensive study on two million patches
publishDate 2025
url https://hdl.handle.net/10356/182504
_version_ 1823807386847019008