Automatic mining of functionally equivalent code fragments via random testing

Similar code may exist in large software projects due to some common software engineering practices, such as copying and pasting code and n-version programming. Although previous work has studied syntactic equivalence and small-scale, coarse-grained program-level and function-level semantic equivale...

Full description

Saved in:

Bibliographic Details
Main Authors:	JIANG, Lingxiao, SU, Zhendong
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2009
Subjects:	code clones random testing functional equivalence Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/954 https://ink.library.smu.edu.sg/context/sis_research/article/1953/viewcontent/AutomaticMiningFuncEquivCode_ISSTA2009.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-1953
record_format	dspace
spelling	sg-smu-ink.sis_research-19532017-02-04T10:21:26Z Automatic mining of functionally equivalent code fragments via random testing JIANG, Lingxiao SU, Zhendong Similar code may exist in large software projects due to some common software engineering practices, such as copying and pasting code and n-version programming. Although previous work has studied syntactic equivalence and small-scale, coarse-grained program-level and function-level semantic equivalence, it is not known whether significant fine-grained, code-level semantic duplications exist. Detecting such semantic equivalence is also desirable because it can enable many applications such as code understanding, maintenance, and optimization. In this paper, we introduce the first algorithm to automatically mine functionally equivalent code fragments of arbitrary size - down to an executable statement. Our notion of functional equivalence is based on input and output behavior. Inspired by Schwartz's randomized polynomial identity testing, we develop our core algorithm using automated random testing: (1) candidate code fragments are automatically extracted from the input program; and (2) random inputs are generated to partition the code fragments based on their output values on the generated inputs. We implemented the algorithm and conducted a large-scale empirical evaluation of it on the Linux kernel 2.6.24. Our results show that there exist many functionally equivalent code fragments that are syntactically different (i.e., they are unlikely due to copying and pasting code). The algorithm also scales to million-line programs; it was able to analyze the Linux kernel with several days of parallel processing. 2009-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/954 info:doi/10.1145/1572272.1572283 https://ink.library.smu.edu.sg/context/sis_research/article/1953/viewcontent/AutomaticMiningFuncEquivCode_ISSTA2009.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University code clones random testing functional equivalence Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	code clones random testing functional equivalence Software Engineering
spellingShingle	code clones random testing functional equivalence Software Engineering JIANG, Lingxiao SU, Zhendong Automatic mining of functionally equivalent code fragments via random testing
description	Similar code may exist in large software projects due to some common software engineering practices, such as copying and pasting code and n-version programming. Although previous work has studied syntactic equivalence and small-scale, coarse-grained program-level and function-level semantic equivalence, it is not known whether significant fine-grained, code-level semantic duplications exist. Detecting such semantic equivalence is also desirable because it can enable many applications such as code understanding, maintenance, and optimization. In this paper, we introduce the first algorithm to automatically mine functionally equivalent code fragments of arbitrary size - down to an executable statement. Our notion of functional equivalence is based on input and output behavior. Inspired by Schwartz's randomized polynomial identity testing, we develop our core algorithm using automated random testing: (1) candidate code fragments are automatically extracted from the input program; and (2) random inputs are generated to partition the code fragments based on their output values on the generated inputs. We implemented the algorithm and conducted a large-scale empirical evaluation of it on the Linux kernel 2.6.24. Our results show that there exist many functionally equivalent code fragments that are syntactically different (i.e., they are unlikely due to copying and pasting code). The algorithm also scales to million-line programs; it was able to analyze the Linux kernel with several days of parallel processing.
format	text
author	JIANG, Lingxiao SU, Zhendong
author_facet	JIANG, Lingxiao SU, Zhendong
author_sort	JIANG, Lingxiao
title	Automatic mining of functionally equivalent code fragments via random testing
title_short	Automatic mining of functionally equivalent code fragments via random testing
title_full	Automatic mining of functionally equivalent code fragments via random testing
title_fullStr	Automatic mining of functionally equivalent code fragments via random testing
title_full_unstemmed	Automatic mining of functionally equivalent code fragments via random testing
title_sort	automatic mining of functionally equivalent code fragments via random testing
publisher	Institutional Knowledge at Singapore Management University
publishDate	2009
url	https://ink.library.smu.edu.sg/sis_research/954 https://ink.library.smu.edu.sg/context/sis_research/article/1953/viewcontent/AutomaticMiningFuncEquivCode_ISSTA2009.pdf
_version_	1770570791955464192

Automatic mining of functionally equivalent code fragments via random testing

Similar Items