Measuring model alignment for code clone detection using causal interpretation

Deep Neural Network-based models have demonstrated high accuracy for semantic code clone detection. However, the lack of generalization poses a threat to the trustworthiness and reliability of these models. Furthermore, the black-box nature of these models makes interpreting the model’s decisions ve...

Full description

Saved in:
Bibliographic Details
Main Authors: ABID, Shamsa, CAI, Xuemeng, JIANG, Lingxiao
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2025
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9927
https://ink.library.smu.edu.sg/context/sis_research/article/10927/viewcontent/clonealignment_emse202412_av.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10927
record_format dspace
spelling sg-smu-ink.sis_research-109272025-01-10T07:20:26Z Measuring model alignment for code clone detection using causal interpretation ABID, Shamsa CAI, Xuemeng JIANG, Lingxiao Deep Neural Network-based models have demonstrated high accuracy for semantic code clone detection. However, the lack of generalization poses a threat to the trustworthiness and reliability of these models. Furthermore, the black-box nature of these models makes interpreting the model’s decisions very challenging. Currently, there is only a limited understanding of the semantic code clone detection behavior of existing models. There is a lack of transparency in understanding how a model identifies semantic code clones and the exact code components influencing its prediction. In this paper, we introduce the use of a causal interpretation framework based on the Neyman-Rubin causal model to gain insight into the decision-making of four state-of-the-art clone detection models. Using the causal interpretation framework, we derive causal explanations of models’ decisions by performing interventions guided by expert-labeled data. We measure the alignment of models’ decision-making with expert intuition by evaluating the causal effects of code similarities and differences on the clone predictions of the models. Additionally, we evaluate the similarity intuition alignment, robustness to confounding influences, and prediction consistency of the models. Finally, we rank the models in order of most aligned and thus most reliable to least aligned and thus least reliable for semantic code clone detection. Our contributions lay a foundation for building and evaluating trustworthy semantic code clone detection systems. 2025-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9927 info:doi/10.1007/s10664-024-10583-0 https://ink.library.smu.edu.sg/context/sis_research/article/10927/viewcontent/clonealignment_emse202412_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Explainable AI model interpretation semantic code clones causal inference model alignment interpreting clone detection Artificial Intelligence and Robotics Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Explainable AI
model interpretation
semantic code clones
causal inference
model alignment
interpreting clone detection
Artificial Intelligence and Robotics
Software Engineering
spellingShingle Explainable AI
model interpretation
semantic code clones
causal inference
model alignment
interpreting clone detection
Artificial Intelligence and Robotics
Software Engineering
ABID, Shamsa
CAI, Xuemeng
JIANG, Lingxiao
Measuring model alignment for code clone detection using causal interpretation
description Deep Neural Network-based models have demonstrated high accuracy for semantic code clone detection. However, the lack of generalization poses a threat to the trustworthiness and reliability of these models. Furthermore, the black-box nature of these models makes interpreting the model’s decisions very challenging. Currently, there is only a limited understanding of the semantic code clone detection behavior of existing models. There is a lack of transparency in understanding how a model identifies semantic code clones and the exact code components influencing its prediction. In this paper, we introduce the use of a causal interpretation framework based on the Neyman-Rubin causal model to gain insight into the decision-making of four state-of-the-art clone detection models. Using the causal interpretation framework, we derive causal explanations of models’ decisions by performing interventions guided by expert-labeled data. We measure the alignment of models’ decision-making with expert intuition by evaluating the causal effects of code similarities and differences on the clone predictions of the models. Additionally, we evaluate the similarity intuition alignment, robustness to confounding influences, and prediction consistency of the models. Finally, we rank the models in order of most aligned and thus most reliable to least aligned and thus least reliable for semantic code clone detection. Our contributions lay a foundation for building and evaluating trustworthy semantic code clone detection systems.
format text
author ABID, Shamsa
CAI, Xuemeng
JIANG, Lingxiao
author_facet ABID, Shamsa
CAI, Xuemeng
JIANG, Lingxiao
author_sort ABID, Shamsa
title Measuring model alignment for code clone detection using causal interpretation
title_short Measuring model alignment for code clone detection using causal interpretation
title_full Measuring model alignment for code clone detection using causal interpretation
title_fullStr Measuring model alignment for code clone detection using causal interpretation
title_full_unstemmed Measuring model alignment for code clone detection using causal interpretation
title_sort measuring model alignment for code clone detection using causal interpretation
publisher Institutional Knowledge at Singapore Management University
publishDate 2025
url https://ink.library.smu.edu.sg/sis_research/9927
https://ink.library.smu.edu.sg/context/sis_research/article/10927/viewcontent/clonealignment_emse202412_av.pdf
_version_ 1821237316729962496