Scalable detection of semantic clones

Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to fin...

Full description

Saved in:

Bibliographic Details
Main Authors:	GABEL, Mark, JIANG, Lingxiao, SU, Zhendong
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2008
Subjects:	program dependence graph refactoring software maintenance clone detection Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/934 https://ink.library.smu.edu.sg/context/sis_research/article/1933/viewcontent/icse08_clone.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-1933
record_format	dspace
spelling	sg-smu-ink.sis_research-19332017-02-05T07:17:23Z Scalable detection of semantic clones GABEL, Mark JIANG, Lingxiao SU, Zhendong Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to finding program fragments that are similar only in their contiguous syntax. Other, semantics-based approaches are more resilient to differences in syntax, such as reordered statements, related statements interleaved with other unrelated statements, or the use of semantically equivalent control structures. However, none of these techniques have scaled to real world code bases. These approaches capture semantic information from Program Dependence Graphs (PDGs), program representations that encode data and control dependencies between statements and predicates. Our definition of a code clone is also based on this representation: we consider program fragments with isomorphic PDGs to be clones. In this paper, we present the first scalable clone detection algorithm based on this definition of semantic clones. Our insight is the reduction of the difficult graph similarity problem to a simpler tree similarity problem by mapping carefully selected PDG subgraphs to their related structured syntax. We efficiently solve the tree similarity problem to create a scalable analysis. We have implemented this algorithm in a practical tool and performed evaluations on several million-line open source projects, including the Linux kernel. Compared with previous approaches, our tool locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments. 2008-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/934 info:doi/10.1145/1368088.1368132 https://ink.library.smu.edu.sg/context/sis_research/article/1933/viewcontent/icse08_clone.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University program dependence graph refactoring software maintenance clone detection Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	program dependence graph refactoring software maintenance clone detection Software Engineering
spellingShingle	program dependence graph refactoring software maintenance clone detection Software Engineering GABEL, Mark JIANG, Lingxiao SU, Zhendong Scalable detection of semantic clones
description	Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to finding program fragments that are similar only in their contiguous syntax. Other, semantics-based approaches are more resilient to differences in syntax, such as reordered statements, related statements interleaved with other unrelated statements, or the use of semantically equivalent control structures. However, none of these techniques have scaled to real world code bases. These approaches capture semantic information from Program Dependence Graphs (PDGs), program representations that encode data and control dependencies between statements and predicates. Our definition of a code clone is also based on this representation: we consider program fragments with isomorphic PDGs to be clones. In this paper, we present the first scalable clone detection algorithm based on this definition of semantic clones. Our insight is the reduction of the difficult graph similarity problem to a simpler tree similarity problem by mapping carefully selected PDG subgraphs to their related structured syntax. We efficiently solve the tree similarity problem to create a scalable analysis. We have implemented this algorithm in a practical tool and performed evaluations on several million-line open source projects, including the Linux kernel. Compared with previous approaches, our tool locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments.
format	text
author	GABEL, Mark JIANG, Lingxiao SU, Zhendong
author_facet	GABEL, Mark JIANG, Lingxiao SU, Zhendong
author_sort	GABEL, Mark
title	Scalable detection of semantic clones
title_short	Scalable detection of semantic clones
title_full	Scalable detection of semantic clones
title_fullStr	Scalable detection of semantic clones
title_full_unstemmed	Scalable detection of semantic clones
title_sort	scalable detection of semantic clones
publisher	Institutional Knowledge at Singapore Management University
publishDate	2008
url	https://ink.library.smu.edu.sg/sis_research/934 https://ink.library.smu.edu.sg/context/sis_research/article/1933/viewcontent/icse08_clone.pdf
_version_	1770570776290787328

Scalable detection of semantic clones

Similar Items