Overfitting in semantics-based automated program repair

The primary goal of Automated Program Repair (APR) is to automatically fix buggy software, to reduce the manual bug-fix burden that presently rests on human developers. Existing APR techniques can be generally divided into two families: semantics- vs. heuristics-based. Semantics-based APR uses symbo...

Full description

Saved in:

Bibliographic Details
Main Authors:	LE, Dinh Xuan Bach, THUNG, Ferdian, LO, David, LE GOUES, Claire
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2018
Subjects:	Automated program repair Program synthesis Symbolic execution Patch overfitting Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/3986 https://ink.library.smu.edu.sg/context/sis_research/article/4988/viewcontent/Overfitting_in_semantics_based_automated_program_repair_afv.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-4988
record_format	dspace
spelling	sg-smu-ink.sis_research-49882020-01-15T14:57:07Z Overfitting in semantics-based automated program repair LE, Dinh Xuan Bach THUNG, Ferdian LO, David LE GOUES, Claire The primary goal of Automated Program Repair (APR) is to automatically fix buggy software, to reduce the manual bug-fix burden that presently rests on human developers. Existing APR techniques can be generally divided into two families: semantics- vs. heuristics-based. Semantics-based APR uses symbolic execution and test suites to extract semantic constraints, and uses program synthesis to synthesize repairs that satisfy the extracted constraints. Heuristic-based APR generates large populations of repair candidates via source manipulation, and searches for the best among them. Both families largely rely on a primary assumption that a program is correctly patched if the generated patch leads the program to pass all provided test cases. Patch correctness is thus an especially pressing concern. A repair technique may generate overfitting patches, which lead a program to pass all existing test cases, but fails to generalize beyond them. In this work, we revisit the overfitting problem with a focus on semantics-based APR techniques, complementing previous studies of the overfitting problem in heuristics-based APR. We perform our study using IntroClass and Codeflaws benchmarks, two datasets well-suited for assessing repair quality, to systematically characterize and understand the nature of overfitting in semantics-based APR. We find that similar to heuristics-based APR, overfitting also occurs in semantics-based APR in various different ways. 2018-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3986 info:doi/10.1007/s10664-017-9577-2 https://ink.library.smu.edu.sg/context/sis_research/article/4988/viewcontent/Overfitting_in_semantics_based_automated_program_repair_afv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Automated program repair Program synthesis Symbolic execution Patch overfitting Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Automated program repair Program synthesis Symbolic execution Patch overfitting Software Engineering
spellingShingle	Automated program repair Program synthesis Symbolic execution Patch overfitting Software Engineering LE, Dinh Xuan Bach THUNG, Ferdian LO, David LE GOUES, Claire Overfitting in semantics-based automated program repair
description	The primary goal of Automated Program Repair (APR) is to automatically fix buggy software, to reduce the manual bug-fix burden that presently rests on human developers. Existing APR techniques can be generally divided into two families: semantics- vs. heuristics-based. Semantics-based APR uses symbolic execution and test suites to extract semantic constraints, and uses program synthesis to synthesize repairs that satisfy the extracted constraints. Heuristic-based APR generates large populations of repair candidates via source manipulation, and searches for the best among them. Both families largely rely on a primary assumption that a program is correctly patched if the generated patch leads the program to pass all provided test cases. Patch correctness is thus an especially pressing concern. A repair technique may generate overfitting patches, which lead a program to pass all existing test cases, but fails to generalize beyond them. In this work, we revisit the overfitting problem with a focus on semantics-based APR techniques, complementing previous studies of the overfitting problem in heuristics-based APR. We perform our study using IntroClass and Codeflaws benchmarks, two datasets well-suited for assessing repair quality, to systematically characterize and understand the nature of overfitting in semantics-based APR. We find that similar to heuristics-based APR, overfitting also occurs in semantics-based APR in various different ways.
format	text
author	LE, Dinh Xuan Bach THUNG, Ferdian LO, David LE GOUES, Claire
author_facet	LE, Dinh Xuan Bach THUNG, Ferdian LO, David LE GOUES, Claire
author_sort	LE, Dinh Xuan Bach
title	Overfitting in semantics-based automated program repair
title_short	Overfitting in semantics-based automated program repair
title_full	Overfitting in semantics-based automated program repair
title_fullStr	Overfitting in semantics-based automated program repair
title_full_unstemmed	Overfitting in semantics-based automated program repair
title_sort	overfitting in semantics-based automated program repair
publisher	Institutional Knowledge at Singapore Management University
publishDate	2018
url	https://ink.library.smu.edu.sg/sis_research/3986 https://ink.library.smu.edu.sg/context/sis_research/article/4988/viewcontent/Overfitting_in_semantics_based_automated_program_repair_afv.pdf
_version_	1770574112140296192

Overfitting in semantics-based automated program repair

Similar Items