Overfitting in semantics-based automated program repair

The primary goal of Automated Program Repair (APR) is to automatically fix buggy software, to reduce the manual bug-fix burden that presently rests on human developers. Existing APR techniques can be generally divided into two families: semantics- vs. heuristics-based. Semantics-based APR uses symbo...

Full description

Saved in:
Bibliographic Details
Main Authors: LE, Dinh Xuan Bach, THUNG, Ferdian, LO, David, LE GOUES, Claire
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3986
https://ink.library.smu.edu.sg/context/sis_research/article/4988/viewcontent/Overfitting_in_semantics_based_automated_program_repair_afv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4988
record_format dspace
spelling sg-smu-ink.sis_research-49882020-01-15T14:57:07Z Overfitting in semantics-based automated program repair LE, Dinh Xuan Bach THUNG, Ferdian LO, David LE GOUES, Claire The primary goal of Automated Program Repair (APR) is to automatically fix buggy software, to reduce the manual bug-fix burden that presently rests on human developers. Existing APR techniques can be generally divided into two families: semantics- vs. heuristics-based. Semantics-based APR uses symbolic execution and test suites to extract semantic constraints, and uses program synthesis to synthesize repairs that satisfy the extracted constraints. Heuristic-based APR generates large populations of repair candidates via source manipulation, and searches for the best among them. Both families largely rely on a primary assumption that a program is correctly patched if the generated patch leads the program to pass all provided test cases. Patch correctness is thus an especially pressing concern. A repair technique may generate overfitting patches, which lead a program to pass all existing test cases, but fails to generalize beyond them. In this work, we revisit the overfitting problem with a focus on semantics-based APR techniques, complementing previous studies of the overfitting problem in heuristics-based APR. We perform our study using IntroClass and Codeflaws benchmarks, two datasets well-suited for assessing repair quality, to systematically characterize and understand the nature of overfitting in semantics-based APR. We find that similar to heuristics-based APR, overfitting also occurs in semantics-based APR in various different ways. 2018-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3986 info:doi/10.1007/s10664-017-9577-2 https://ink.library.smu.edu.sg/context/sis_research/article/4988/viewcontent/Overfitting_in_semantics_based_automated_program_repair_afv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Automated program repair Program synthesis Symbolic execution Patch overfitting Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Automated program repair
Program synthesis
Symbolic execution
Patch overfitting
Software Engineering
spellingShingle Automated program repair
Program synthesis
Symbolic execution
Patch overfitting
Software Engineering
LE, Dinh Xuan Bach
THUNG, Ferdian
LO, David
LE GOUES, Claire
Overfitting in semantics-based automated program repair
description The primary goal of Automated Program Repair (APR) is to automatically fix buggy software, to reduce the manual bug-fix burden that presently rests on human developers. Existing APR techniques can be generally divided into two families: semantics- vs. heuristics-based. Semantics-based APR uses symbolic execution and test suites to extract semantic constraints, and uses program synthesis to synthesize repairs that satisfy the extracted constraints. Heuristic-based APR generates large populations of repair candidates via source manipulation, and searches for the best among them. Both families largely rely on a primary assumption that a program is correctly patched if the generated patch leads the program to pass all provided test cases. Patch correctness is thus an especially pressing concern. A repair technique may generate overfitting patches, which lead a program to pass all existing test cases, but fails to generalize beyond them. In this work, we revisit the overfitting problem with a focus on semantics-based APR techniques, complementing previous studies of the overfitting problem in heuristics-based APR. We perform our study using IntroClass and Codeflaws benchmarks, two datasets well-suited for assessing repair quality, to systematically characterize and understand the nature of overfitting in semantics-based APR. We find that similar to heuristics-based APR, overfitting also occurs in semantics-based APR in various different ways.
format text
author LE, Dinh Xuan Bach
THUNG, Ferdian
LO, David
LE GOUES, Claire
author_facet LE, Dinh Xuan Bach
THUNG, Ferdian
LO, David
LE GOUES, Claire
author_sort LE, Dinh Xuan Bach
title Overfitting in semantics-based automated program repair
title_short Overfitting in semantics-based automated program repair
title_full Overfitting in semantics-based automated program repair
title_fullStr Overfitting in semantics-based automated program repair
title_full_unstemmed Overfitting in semantics-based automated program repair
title_sort overfitting in semantics-based automated program repair
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/3986
https://ink.library.smu.edu.sg/context/sis_research/article/4988/viewcontent/Overfitting_in_semantics_based_automated_program_repair_afv.pdf
_version_ 1770574112140296192