DECKARD: Scalable and accurate tree-based detection of code clones

Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of...

Full description

Saved in:
Bibliographic Details
Main Authors: JIANG, Lingxiao, MISHERGHI, Ghassan, SU, Zhendong, GLONDU, Stephane
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2007
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/1011
https://ink.library.smu.edu.sg/context/sis_research/article/2010/viewcontent/JIANGLXdeckard_icse07.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2010
record_format dspace
spelling sg-smu-ink.sis_research-20102021-03-12T08:04:48Z DECKARD: Scalable and accurate tree-based detection of code clones JIANG, Lingxiao MISHERGHI, Ghassan SU, Zhendong GLONDU, Stephane Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean Rn and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar. 2007-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1011 info:doi/10.1109/ICSE.2007.30 https://ink.library.smu.edu.sg/context/sis_research/article/2010/viewcontent/JIANGLXdeckard_icse07.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University detection of code clones software engineering applications efficient algorithm Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic detection of code clones
software engineering applications
efficient algorithm
Software Engineering
spellingShingle detection of code clones
software engineering applications
efficient algorithm
Software Engineering
JIANG, Lingxiao
MISHERGHI, Ghassan
SU, Zhendong
GLONDU, Stephane
DECKARD: Scalable and accurate tree-based detection of code clones
description Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean Rn and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.
format text
author JIANG, Lingxiao
MISHERGHI, Ghassan
SU, Zhendong
GLONDU, Stephane
author_facet JIANG, Lingxiao
MISHERGHI, Ghassan
SU, Zhendong
GLONDU, Stephane
author_sort JIANG, Lingxiao
title DECKARD: Scalable and accurate tree-based detection of code clones
title_short DECKARD: Scalable and accurate tree-based detection of code clones
title_full DECKARD: Scalable and accurate tree-based detection of code clones
title_fullStr DECKARD: Scalable and accurate tree-based detection of code clones
title_full_unstemmed DECKARD: Scalable and accurate tree-based detection of code clones
title_sort deckard: scalable and accurate tree-based detection of code clones
publisher Institutional Knowledge at Singapore Management University
publishDate 2007
url https://ink.library.smu.edu.sg/sis_research/1011
https://ink.library.smu.edu.sg/context/sis_research/article/2010/viewcontent/JIANGLXdeckard_icse07.pdf
_version_ 1770570822635749376