Towards explainable neural network fairness

Neural networks are widely applied in solving many real-world problems. At the same time, they are shown to be vulnerable to attacks, difficult to debug, non-transparent and subject to fairness issues. Discrimination has been observed in various machine learning models, including Large Language Mode...

Full description

Saved in:
Bibliographic Details
Main Author: ZHANG, Mengdi
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/547
https://ink.library.smu.edu.sg/context/etd_coll/article/1545/viewcontent/GPIS_AY2019_PhD_Mengdi_Zhang.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Neural networks are widely applied in solving many real-world problems. At the same time, they are shown to be vulnerable to attacks, difficult to debug, non-transparent and subject to fairness issues. Discrimination has been observed in various machine learning models, including Large Language Models (LLMs), which calls for systematic fairness evaluation (i.e., testing, verification or even certification) before their deployment in ethic-relevant domains. If a model is found to be discriminating, we must apply systematic measure to improve its fairness. In the literature, multiple categories of fairness improving methods have been discussed, including pre-processing, in-processing and post-processing. In this dissertation, we aim to develop methods which identity fairness issues in neural networks and mitigate discrimination in a systematic explainable way. To achieving this goal, we start with developing a method of explaining how a neural network makes decisions based on simple rules. One factor contributing to fairness concerns is the inherent black-box nature of neural networks. This makes it challenging to discern the rationale behind specific decisions, potentially resulting in biased outcomes. So, in the first work, we focus on explaining neural networks using rules which are not only accurate but also provide insights into the underlying decision-making process. We provide two measurements for neural network decision explainability, and develop automated evaluation algorithms. In the second research work, we apply the rule-based idea to identify fairness issues that can be explained. We notice that group discrimination is mostly hidden and less studied. Therefore, we propose \sgde, an interpretable testing approach which systematically identifies and measures hidden group discrimination of a neural network characterized by an interpretable rule set which indicates conditions over combinations of the sensitive features. Specifically, given a neural network, \sgd first automatically generates a rule set and then provides an estimated group fairness score to measure the degree of the identifies subtle group discrimination with theoretical error bounds. In the third research work, we design an approach which explore the causes of fairness issue and mitigate them systematically. Specifically, we first apply and empirical study which shows that existing fairness improvement methods are not always effective (e.g., they may improve fairness by paying the price of huge accuracy drop) or even not helpful (e.g., they may even worsen both fairness and accuracy). Then, we propose an approach which adaptively chooses the fairness improving method based on causality analysis. That is, we choose the method based on how the neurons and attributes responsible for unfairness are distributed among the input attributes and the hidden neurons. Lastly, we present a method which would allow us to extend our fairness mitigation approach to Large Language Models. As existing bias mitigation research typically do not apply in the era of LLMs, we propose a non-intrusive bias mitigation approach which does not require accessing or modifying the internals of LLMs. Specifically, we propose a parameter-efficient debias adapter that not only improves fairness systematically but also provides a theoretical statistical guarantee on the achieved fairness whenever feasible during debiasing.