Counterfactual explanations for machine learning models on heterogeneous data

Counterfactual explanation aims to identify minimal and meaningful changes required in an input instance to produce a different prediction from a given model. Counterfactual explanations can assist users in comprehending the model's current prediction, detecting model unfairness, and providing...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Yongjie
Other Authors: Miao Chun Yan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/169968
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Counterfactual explanation aims to identify minimal and meaningful changes required in an input instance to produce a different prediction from a given model. Counterfactual explanations can assist users in comprehending the model's current prediction, detecting model unfairness, and providing actionable recommendations for users who receive an undesired prediction. Consequently, counterfactual explanations have diverse applications in fields such as education, finance, marketing, and healthcare. The counterfactual explanation problem is formulated as a constrained optimization problem, where the goal is to minimize the cost between the input and counterfactual explanations subject to certain constraints. Existing research has mainly focused on two areas: incorporating practical constraints and introducing various solving methods. However, counterfactual explanations are still far from practical deployment. In this thesis, we improve this problem from the angles of trust, actionability, and safety, thus making counterfactual explanations more deployable. One goal of counterfactual explanations is to seek action suggestions from the model. However, commonly used models such as ensemble models and neural networks are black boxes with poor trustworthiness. Explaining the model can improve the trustworthiness of models. Yet, global explanations are too general to apply to all instances, and examining all local explanations one by one is also a burden. Therefore, we propose a group-level summarization method that finds $k$ groups, where each group is summarized by the distinct top-$l$ important features for a feature importance matrix. This approach provides a compact summary that makes it easier to understand and inspect the model. In real-life applications, it is difficult to compare changes in heterogeneous features with a scalar cost function. Moreover, existing methods do not support interactive exploration for users. To address them, we propose a skyline method that treats the change of each incomparable feature as an objective to minimize and finds a set of non-dominant counterfactual explanations. Users can interactively refine their requirements from this non-dominated set. Our experiments demonstrate that our method provides superior results compared to state-of-the-art methods Model security and privacy are critical concerns for model owners who want to deploy a counterfactual explanation service. However, these issues have not received much attention in the literature. To address this gap, we propose an efficient and effective attack method that can extract the pretrained model through counterfactual explanations (CFs). Specifically, our method treats CFs as common queries to find counterfactual explanations of counterfactual explanations (CCFs) and then trains a substitute model using pairs of CFs and CCFs. Experiments reveal that our approach can obtain a substitute model with a higher agreement. In summary, our research helps to bridge the research gap between the theoretical understanding and practical use of counterfactual explanations and provides valuable insights for researchers and practitioners in various domains.