Counterfactual explanations for machine learning models on heterogeneous data

Counterfactual explanation aims to identify minimal and meaningful changes required in an input instance to produce a different prediction from a given model. Counterfactual explanations can assist users in comprehending the model's current prediction, detecting model unfairness, and providing...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Yongjie
Other Authors: Miao Chun Yan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/169968
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-169968
record_format dspace
spelling sg-ntu-dr.10356-1699682023-09-04T07:32:08Z Counterfactual explanations for machine learning models on heterogeneous data Wang, Yongjie Miao Chun Yan School of Computer Science and Engineering ASCYMiao@ntu.edu.sg Engineering::Computer science and engineering Counterfactual explanation aims to identify minimal and meaningful changes required in an input instance to produce a different prediction from a given model. Counterfactual explanations can assist users in comprehending the model's current prediction, detecting model unfairness, and providing actionable recommendations for users who receive an undesired prediction. Consequently, counterfactual explanations have diverse applications in fields such as education, finance, marketing, and healthcare. The counterfactual explanation problem is formulated as a constrained optimization problem, where the goal is to minimize the cost between the input and counterfactual explanations subject to certain constraints. Existing research has mainly focused on two areas: incorporating practical constraints and introducing various solving methods. However, counterfactual explanations are still far from practical deployment. In this thesis, we improve this problem from the angles of trust, actionability, and safety, thus making counterfactual explanations more deployable. One goal of counterfactual explanations is to seek action suggestions from the model. However, commonly used models such as ensemble models and neural networks are black boxes with poor trustworthiness. Explaining the model can improve the trustworthiness of models. Yet, global explanations are too general to apply to all instances, and examining all local explanations one by one is also a burden. Therefore, we propose a group-level summarization method that finds $k$ groups, where each group is summarized by the distinct top-$l$ important features for a feature importance matrix. This approach provides a compact summary that makes it easier to understand and inspect the model. In real-life applications, it is difficult to compare changes in heterogeneous features with a scalar cost function. Moreover, existing methods do not support interactive exploration for users. To address them, we propose a skyline method that treats the change of each incomparable feature as an objective to minimize and finds a set of non-dominant counterfactual explanations. Users can interactively refine their requirements from this non-dominated set. Our experiments demonstrate that our method provides superior results compared to state-of-the-art methods Model security and privacy are critical concerns for model owners who want to deploy a counterfactual explanation service. However, these issues have not received much attention in the literature. To address this gap, we propose an efficient and effective attack method that can extract the pretrained model through counterfactual explanations (CFs). Specifically, our method treats CFs as common queries to find counterfactual explanations of counterfactual explanations (CCFs) and then trains a substitute model using pairs of CFs and CCFs. Experiments reveal that our approach can obtain a substitute model with a higher agreement. In summary, our research helps to bridge the research gap between the theoretical understanding and practical use of counterfactual explanations and provides valuable insights for researchers and practitioners in various domains. Doctor of Philosophy 2023-08-18T02:24:14Z 2023-08-18T02:24:14Z 2023 Thesis-Doctor of Philosophy Wang, Y. (2023). Counterfactual explanations for machine learning models on heterogeneous data. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/169968 https://hdl.handle.net/10356/169968 10.32657/10356/169968 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Wang, Yongjie
Counterfactual explanations for machine learning models on heterogeneous data
description Counterfactual explanation aims to identify minimal and meaningful changes required in an input instance to produce a different prediction from a given model. Counterfactual explanations can assist users in comprehending the model's current prediction, detecting model unfairness, and providing actionable recommendations for users who receive an undesired prediction. Consequently, counterfactual explanations have diverse applications in fields such as education, finance, marketing, and healthcare. The counterfactual explanation problem is formulated as a constrained optimization problem, where the goal is to minimize the cost between the input and counterfactual explanations subject to certain constraints. Existing research has mainly focused on two areas: incorporating practical constraints and introducing various solving methods. However, counterfactual explanations are still far from practical deployment. In this thesis, we improve this problem from the angles of trust, actionability, and safety, thus making counterfactual explanations more deployable. One goal of counterfactual explanations is to seek action suggestions from the model. However, commonly used models such as ensemble models and neural networks are black boxes with poor trustworthiness. Explaining the model can improve the trustworthiness of models. Yet, global explanations are too general to apply to all instances, and examining all local explanations one by one is also a burden. Therefore, we propose a group-level summarization method that finds $k$ groups, where each group is summarized by the distinct top-$l$ important features for a feature importance matrix. This approach provides a compact summary that makes it easier to understand and inspect the model. In real-life applications, it is difficult to compare changes in heterogeneous features with a scalar cost function. Moreover, existing methods do not support interactive exploration for users. To address them, we propose a skyline method that treats the change of each incomparable feature as an objective to minimize and finds a set of non-dominant counterfactual explanations. Users can interactively refine their requirements from this non-dominated set. Our experiments demonstrate that our method provides superior results compared to state-of-the-art methods Model security and privacy are critical concerns for model owners who want to deploy a counterfactual explanation service. However, these issues have not received much attention in the literature. To address this gap, we propose an efficient and effective attack method that can extract the pretrained model through counterfactual explanations (CFs). Specifically, our method treats CFs as common queries to find counterfactual explanations of counterfactual explanations (CCFs) and then trains a substitute model using pairs of CFs and CCFs. Experiments reveal that our approach can obtain a substitute model with a higher agreement. In summary, our research helps to bridge the research gap between the theoretical understanding and practical use of counterfactual explanations and provides valuable insights for researchers and practitioners in various domains.
author2 Miao Chun Yan
author_facet Miao Chun Yan
Wang, Yongjie
format Thesis-Doctor of Philosophy
author Wang, Yongjie
author_sort Wang, Yongjie
title Counterfactual explanations for machine learning models on heterogeneous data
title_short Counterfactual explanations for machine learning models on heterogeneous data
title_full Counterfactual explanations for machine learning models on heterogeneous data
title_fullStr Counterfactual explanations for machine learning models on heterogeneous data
title_full_unstemmed Counterfactual explanations for machine learning models on heterogeneous data
title_sort counterfactual explanations for machine learning models on heterogeneous data
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/169968
_version_ 1779156764884729856