Measuring model alignment for code clone detection using causal interpretation
Deep Neural Network-based models have demonstrated high accuracy for semantic code clone detection. However, the lack of generalization poses a threat to the trustworthiness and reliability of these models. Furthermore, the black-box nature of these models makes interpreting the model’s decisions ve...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2025
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9927 https://ink.library.smu.edu.sg/context/sis_research/article/10927/viewcontent/clonealignment_emse202412_av.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-10927 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-109272025-01-10T07:20:26Z Measuring model alignment for code clone detection using causal interpretation ABID, Shamsa CAI, Xuemeng JIANG, Lingxiao Deep Neural Network-based models have demonstrated high accuracy for semantic code clone detection. However, the lack of generalization poses a threat to the trustworthiness and reliability of these models. Furthermore, the black-box nature of these models makes interpreting the model’s decisions very challenging. Currently, there is only a limited understanding of the semantic code clone detection behavior of existing models. There is a lack of transparency in understanding how a model identifies semantic code clones and the exact code components influencing its prediction. In this paper, we introduce the use of a causal interpretation framework based on the Neyman-Rubin causal model to gain insight into the decision-making of four state-of-the-art clone detection models. Using the causal interpretation framework, we derive causal explanations of models’ decisions by performing interventions guided by expert-labeled data. We measure the alignment of models’ decision-making with expert intuition by evaluating the causal effects of code similarities and differences on the clone predictions of the models. Additionally, we evaluate the similarity intuition alignment, robustness to confounding influences, and prediction consistency of the models. Finally, we rank the models in order of most aligned and thus most reliable to least aligned and thus least reliable for semantic code clone detection. Our contributions lay a foundation for building and evaluating trustworthy semantic code clone detection systems. 2025-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9927 info:doi/10.1007/s10664-024-10583-0 https://ink.library.smu.edu.sg/context/sis_research/article/10927/viewcontent/clonealignment_emse202412_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Explainable AI model interpretation semantic code clones causal inference model alignment interpreting clone detection Artificial Intelligence and Robotics Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Explainable AI model interpretation semantic code clones causal inference model alignment interpreting clone detection Artificial Intelligence and Robotics Software Engineering |
spellingShingle |
Explainable AI model interpretation semantic code clones causal inference model alignment interpreting clone detection Artificial Intelligence and Robotics Software Engineering ABID, Shamsa CAI, Xuemeng JIANG, Lingxiao Measuring model alignment for code clone detection using causal interpretation |
description |
Deep Neural Network-based models have demonstrated high accuracy for semantic code clone detection. However, the lack of generalization poses a threat to the trustworthiness and reliability of these models. Furthermore, the black-box nature of these models makes interpreting the model’s decisions very challenging. Currently, there is only a limited understanding of the semantic code clone detection behavior of existing models. There is a lack of transparency in understanding how a model identifies semantic code clones and the exact code components influencing its prediction. In this paper, we introduce the use of a causal interpretation framework based on the Neyman-Rubin causal model to gain insight into the decision-making of four state-of-the-art clone detection models. Using the causal interpretation framework, we derive causal explanations of models’ decisions by performing interventions guided by expert-labeled data. We measure the alignment of models’ decision-making with expert intuition by evaluating the causal effects of code similarities and differences on the clone predictions of the models. Additionally, we evaluate the similarity intuition alignment, robustness to confounding influences, and prediction consistency of the models. Finally, we rank the models in order of most aligned and thus most reliable to least aligned and thus least reliable for semantic code clone detection. Our contributions lay a foundation for building and evaluating trustworthy semantic code clone detection systems. |
format |
text |
author |
ABID, Shamsa CAI, Xuemeng JIANG, Lingxiao |
author_facet |
ABID, Shamsa CAI, Xuemeng JIANG, Lingxiao |
author_sort |
ABID, Shamsa |
title |
Measuring model alignment for code clone detection using causal interpretation |
title_short |
Measuring model alignment for code clone detection using causal interpretation |
title_full |
Measuring model alignment for code clone detection using causal interpretation |
title_fullStr |
Measuring model alignment for code clone detection using causal interpretation |
title_full_unstemmed |
Measuring model alignment for code clone detection using causal interpretation |
title_sort |
measuring model alignment for code clone detection using causal interpretation |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2025 |
url |
https://ink.library.smu.edu.sg/sis_research/9927 https://ink.library.smu.edu.sg/context/sis_research/article/10927/viewcontent/clonealignment_emse202412_av.pdf |
_version_ |
1821237316729962496 |