C³: Code clone-based identification of duplicated components

Reinventing the wheel is a detrimental programming practice in software development that frequently results in the introduction of duplicated components. This practice not only leads to increased maintenance and labor costs but also poses a higher risk of propagating bugs throughout the system. Desp...

Full description

Saved in:
Bibliographic Details
Main Authors: YANG, Yanming, ZOU, Ying, HU, Xing, LO, David, NI, Chao, GRUNDY, John C., XIA, Xin:
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8575
https://ink.library.smu.edu.sg/context/sis_research/article/9578/viewcontent/C3__Code_Clone_based_Identification_of_Duplicated_Components.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9578
record_format dspace
spelling sg-smu-ink.sis_research-95782024-01-25T08:58:43Z C³: Code clone-based identification of duplicated components YANG, Yanming ZOU, Ying HU, Xing LO, David NI, Chao GRUNDY, John C. XIA, Xin: Reinventing the wheel is a detrimental programming practice in software development that frequently results in the introduction of duplicated components. This practice not only leads to increased maintenance and labor costs but also poses a higher risk of propagating bugs throughout the system. Despite numerous issues introduced by duplicated components in software, the identification of component-level clones remains a significant challenge that existing studies struggle to effectively tackle. Specifically, existing methods face two primary limitations that are challenging to overcome: 1) Measuring the similarity between different components presents a challenge due to the significant size differences among them; 2) Identifying functional clones is a complex task as determining the primary functionality of components proves to be difficult. To overcome the aforementioned challenges, we present a novel approach named C3 (Component-level Code Clone detector) to effectively identify both textual and functional cloned components. In addition, to enhance the efficiency of eliminating cloned components, we develop an assessment method based on six component-level clone features, which assists developers in prioritizing the cloned components based on the refactoring necessity. To validate the effectiveness of C3, we employ a large-scale industrial product developed by Huawei, a prominent global ICT company, as our dataset and apply C3 to this dataset to identify the cloned components. Our experimental results demonstrate that C3 is capable of accurately detecting cloned components, achieving impressive performance in terms of precision (0.93), recall (0.91), and F1-score (0.9). Besides, we conduct a comprehensive user study to further validate the effectiveness and practicality of our assessment method and the proposed clone features in assessing the refactoring necessity of different cloned components. Our study establishes solid alignment between assessment outcomes and participant responses, indicating the accurate prioritization of clone components with a high refactoring necessity through our method. This finding further confirms the usefulness of the six "golden features"in our assessment. 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8575 info:doi/10.1145/3611643.3613883 https://ink.library.smu.edu.sg/context/sis_research/article/9578/viewcontent/C3__Code_Clone_based_Identification_of_Duplicated_Components.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Clone detection Code clone Community detection algorithms Component levels Component-level clone detection Component-level clone metric Maintenance cost Programming practices Refactorings Databases and Information Systems Software Engineering Theory and Algorithms
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Clone detection
Code clone
Community detection algorithms
Component levels
Component-level clone detection
Component-level clone metric
Maintenance cost
Programming practices
Refactorings
Databases and Information Systems
Software Engineering
Theory and Algorithms
spellingShingle Clone detection
Code clone
Community detection algorithms
Component levels
Component-level clone detection
Component-level clone metric
Maintenance cost
Programming practices
Refactorings
Databases and Information Systems
Software Engineering
Theory and Algorithms
YANG, Yanming
ZOU, Ying
HU, Xing
LO, David
NI, Chao
GRUNDY, John C.
XIA, Xin:
C³: Code clone-based identification of duplicated components
description Reinventing the wheel is a detrimental programming practice in software development that frequently results in the introduction of duplicated components. This practice not only leads to increased maintenance and labor costs but also poses a higher risk of propagating bugs throughout the system. Despite numerous issues introduced by duplicated components in software, the identification of component-level clones remains a significant challenge that existing studies struggle to effectively tackle. Specifically, existing methods face two primary limitations that are challenging to overcome: 1) Measuring the similarity between different components presents a challenge due to the significant size differences among them; 2) Identifying functional clones is a complex task as determining the primary functionality of components proves to be difficult. To overcome the aforementioned challenges, we present a novel approach named C3 (Component-level Code Clone detector) to effectively identify both textual and functional cloned components. In addition, to enhance the efficiency of eliminating cloned components, we develop an assessment method based on six component-level clone features, which assists developers in prioritizing the cloned components based on the refactoring necessity. To validate the effectiveness of C3, we employ a large-scale industrial product developed by Huawei, a prominent global ICT company, as our dataset and apply C3 to this dataset to identify the cloned components. Our experimental results demonstrate that C3 is capable of accurately detecting cloned components, achieving impressive performance in terms of precision (0.93), recall (0.91), and F1-score (0.9). Besides, we conduct a comprehensive user study to further validate the effectiveness and practicality of our assessment method and the proposed clone features in assessing the refactoring necessity of different cloned components. Our study establishes solid alignment between assessment outcomes and participant responses, indicating the accurate prioritization of clone components with a high refactoring necessity through our method. This finding further confirms the usefulness of the six "golden features"in our assessment.
format text
author YANG, Yanming
ZOU, Ying
HU, Xing
LO, David
NI, Chao
GRUNDY, John C.
XIA, Xin:
author_facet YANG, Yanming
ZOU, Ying
HU, Xing
LO, David
NI, Chao
GRUNDY, John C.
XIA, Xin:
author_sort YANG, Yanming
title C³: Code clone-based identification of duplicated components
title_short C³: Code clone-based identification of duplicated components
title_full C³: Code clone-based identification of duplicated components
title_fullStr C³: Code clone-based identification of duplicated components
title_full_unstemmed C³: Code clone-based identification of duplicated components
title_sort c³: code clone-based identification of duplicated components
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8575
https://ink.library.smu.edu.sg/context/sis_research/article/9578/viewcontent/C3__Code_Clone_based_Identification_of_Duplicated_Components.pdf
_version_ 1789483279007612928