Measuring model alignment for code clone detection using causal interpretation

Deep Neural Network-based models have demonstrated high accuracy for semantic code clone detection. However, the lack of generalization poses a threat to the trustworthiness and reliability of these models. Furthermore, the black-box nature of these models makes interpreting the model’s decisions ve...

Full description

Saved in:

Bibliographic Details
Main Authors:	ABID, Shamsa, CAI, Xuemeng, JIANG, Lingxiao
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2025
Subjects:	Explainable AI model interpretation semantic code clones causal inference model alignment interpreting clone detection Artificial Intelligence and Robotics Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9927 https://ink.library.smu.edu.sg/context/sis_research/article/10927/viewcontent/clonealignment_emse202412_av.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10927
record_format	dspace
spelling	sg-smu-ink.sis_research-109272025-01-10T07:20:26Z Measuring model alignment for code clone detection using causal interpretation ABID, Shamsa CAI, Xuemeng JIANG, Lingxiao Deep Neural Network-based models have demonstrated high accuracy for semantic code clone detection. However, the lack of generalization poses a threat to the trustworthiness and reliability of these models. Furthermore, the black-box nature of these models makes interpreting the model’s decisions very challenging. Currently, there is only a limited understanding of the semantic code clone detection behavior of existing models. There is a lack of transparency in understanding how a model identifies semantic code clones and the exact code components influencing its prediction. In this paper, we introduce the use of a causal interpretation framework based on the Neyman-Rubin causal model to gain insight into the decision-making of four state-of-the-art clone detection models. Using the causal interpretation framework, we derive causal explanations of models’ decisions by performing interventions guided by expert-labeled data. We measure the alignment of models’ decision-making with expert intuition by evaluating the causal effects of code similarities and differences on the clone predictions of the models. Additionally, we evaluate the similarity intuition alignment, robustness to confounding influences, and prediction consistency of the models. Finally, we rank the models in order of most aligned and thus most reliable to least aligned and thus least reliable for semantic code clone detection. Our contributions lay a foundation for building and evaluating trustworthy semantic code clone detection systems. 2025-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9927 info:doi/10.1007/s10664-024-10583-0 https://ink.library.smu.edu.sg/context/sis_research/article/10927/viewcontent/clonealignment_emse202412_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Explainable AI model interpretation semantic code clones causal inference model alignment interpreting clone detection Artificial Intelligence and Robotics Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Explainable AI model interpretation semantic code clones causal inference model alignment interpreting clone detection Artificial Intelligence and Robotics Software Engineering
spellingShingle	Explainable AI model interpretation semantic code clones causal inference model alignment interpreting clone detection Artificial Intelligence and Robotics Software Engineering ABID, Shamsa CAI, Xuemeng JIANG, Lingxiao Measuring model alignment for code clone detection using causal interpretation
description	Deep Neural Network-based models have demonstrated high accuracy for semantic code clone detection. However, the lack of generalization poses a threat to the trustworthiness and reliability of these models. Furthermore, the black-box nature of these models makes interpreting the model’s decisions very challenging. Currently, there is only a limited understanding of the semantic code clone detection behavior of existing models. There is a lack of transparency in understanding how a model identifies semantic code clones and the exact code components influencing its prediction. In this paper, we introduce the use of a causal interpretation framework based on the Neyman-Rubin causal model to gain insight into the decision-making of four state-of-the-art clone detection models. Using the causal interpretation framework, we derive causal explanations of models’ decisions by performing interventions guided by expert-labeled data. We measure the alignment of models’ decision-making with expert intuition by evaluating the causal effects of code similarities and differences on the clone predictions of the models. Additionally, we evaluate the similarity intuition alignment, robustness to confounding influences, and prediction consistency of the models. Finally, we rank the models in order of most aligned and thus most reliable to least aligned and thus least reliable for semantic code clone detection. Our contributions lay a foundation for building and evaluating trustworthy semantic code clone detection systems.
format	text
author	ABID, Shamsa CAI, Xuemeng JIANG, Lingxiao
author_facet	ABID, Shamsa CAI, Xuemeng JIANG, Lingxiao
author_sort	ABID, Shamsa
title	Measuring model alignment for code clone detection using causal interpretation
title_short	Measuring model alignment for code clone detection using causal interpretation
title_full	Measuring model alignment for code clone detection using causal interpretation
title_fullStr	Measuring model alignment for code clone detection using causal interpretation
title_full_unstemmed	Measuring model alignment for code clone detection using causal interpretation
title_sort	measuring model alignment for code clone detection using causal interpretation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2025
url	https://ink.library.smu.edu.sg/sis_research/9927 https://ink.library.smu.edu.sg/context/sis_research/article/10927/viewcontent/clonealignment_emse202412_av.pdf
_version_	1821237316729962496

Measuring model alignment for code clone detection using causal interpretation

Similar Items