Towards explainable neural network fairness

Neural networks are widely applied in solving many real-world problems. At the same time, they are shown to be vulnerable to attacks, difficult to debug, non-transparent and subject to fairness issues. Discrimination has been observed in various machine learning models, including Large Language Mode...

Full description

Saved in:
Bibliographic Details
Main Author: ZHANG, Mengdi
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/547
https://ink.library.smu.edu.sg/context/etd_coll/article/1545/viewcontent/GPIS_AY2019_PhD_Mengdi_Zhang.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.etd_coll-1545
record_format dspace
spelling sg-smu-ink.etd_coll-15452024-06-20T01:50:07Z Towards explainable neural network fairness ZHANG, Mengdi Neural networks are widely applied in solving many real-world problems. At the same time, they are shown to be vulnerable to attacks, difficult to debug, non-transparent and subject to fairness issues. Discrimination has been observed in various machine learning models, including Large Language Models (LLMs), which calls for systematic fairness evaluation (i.e., testing, verification or even certification) before their deployment in ethic-relevant domains. If a model is found to be discriminating, we must apply systematic measure to improve its fairness. In the literature, multiple categories of fairness improving methods have been discussed, including pre-processing, in-processing and post-processing. In this dissertation, we aim to develop methods which identity fairness issues in neural networks and mitigate discrimination in a systematic explainable way. To achieving this goal, we start with developing a method of explaining how a neural network makes decisions based on simple rules. One factor contributing to fairness concerns is the inherent black-box nature of neural networks. This makes it challenging to discern the rationale behind specific decisions, potentially resulting in biased outcomes. So, in the first work, we focus on explaining neural networks using rules which are not only accurate but also provide insights into the underlying decision-making process. We provide two measurements for neural network decision explainability, and develop automated evaluation algorithms. In the second research work, we apply the rule-based idea to identify fairness issues that can be explained. We notice that group discrimination is mostly hidden and less studied. Therefore, we propose \sgde, an interpretable testing approach which systematically identifies and measures hidden group discrimination of a neural network characterized by an interpretable rule set which indicates conditions over combinations of the sensitive features. Specifically, given a neural network, \sgd first automatically generates a rule set and then provides an estimated group fairness score to measure the degree of the identifies subtle group discrimination with theoretical error bounds. In the third research work, we design an approach which explore the causes of fairness issue and mitigate them systematically. Specifically, we first apply and empirical study which shows that existing fairness improvement methods are not always effective (e.g., they may improve fairness by paying the price of huge accuracy drop) or even not helpful (e.g., they may even worsen both fairness and accuracy). Then, we propose an approach which adaptively chooses the fairness improving method based on causality analysis. That is, we choose the method based on how the neurons and attributes responsible for unfairness are distributed among the input attributes and the hidden neurons. Lastly, we present a method which would allow us to extend our fairness mitigation approach to Large Language Models. As existing bias mitigation research typically do not apply in the era of LLMs, we propose a non-intrusive bias mitigation approach which does not require accessing or modifying the internals of LLMs. Specifically, we propose a parameter-efficient debias adapter that not only improves fairness systematically but also provides a theoretical statistical guarantee on the achieved fairness whenever feasible during debiasing. 2024-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/547 https://ink.library.smu.edu.sg/context/etd_coll/article/1545/viewcontent/GPIS_AY2019_PhD_Mengdi_Zhang.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University AI Fairness Discrimination Fairness Improvement Bias Mitigation AI Testing Computer Sciences OS and Networks
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic AI Fairness
Discrimination
Fairness Improvement
Bias Mitigation
AI Testing
Computer Sciences
OS and Networks
spellingShingle AI Fairness
Discrimination
Fairness Improvement
Bias Mitigation
AI Testing
Computer Sciences
OS and Networks
ZHANG, Mengdi
Towards explainable neural network fairness
description Neural networks are widely applied in solving many real-world problems. At the same time, they are shown to be vulnerable to attacks, difficult to debug, non-transparent and subject to fairness issues. Discrimination has been observed in various machine learning models, including Large Language Models (LLMs), which calls for systematic fairness evaluation (i.e., testing, verification or even certification) before their deployment in ethic-relevant domains. If a model is found to be discriminating, we must apply systematic measure to improve its fairness. In the literature, multiple categories of fairness improving methods have been discussed, including pre-processing, in-processing and post-processing. In this dissertation, we aim to develop methods which identity fairness issues in neural networks and mitigate discrimination in a systematic explainable way. To achieving this goal, we start with developing a method of explaining how a neural network makes decisions based on simple rules. One factor contributing to fairness concerns is the inherent black-box nature of neural networks. This makes it challenging to discern the rationale behind specific decisions, potentially resulting in biased outcomes. So, in the first work, we focus on explaining neural networks using rules which are not only accurate but also provide insights into the underlying decision-making process. We provide two measurements for neural network decision explainability, and develop automated evaluation algorithms. In the second research work, we apply the rule-based idea to identify fairness issues that can be explained. We notice that group discrimination is mostly hidden and less studied. Therefore, we propose \sgde, an interpretable testing approach which systematically identifies and measures hidden group discrimination of a neural network characterized by an interpretable rule set which indicates conditions over combinations of the sensitive features. Specifically, given a neural network, \sgd first automatically generates a rule set and then provides an estimated group fairness score to measure the degree of the identifies subtle group discrimination with theoretical error bounds. In the third research work, we design an approach which explore the causes of fairness issue and mitigate them systematically. Specifically, we first apply and empirical study which shows that existing fairness improvement methods are not always effective (e.g., they may improve fairness by paying the price of huge accuracy drop) or even not helpful (e.g., they may even worsen both fairness and accuracy). Then, we propose an approach which adaptively chooses the fairness improving method based on causality analysis. That is, we choose the method based on how the neurons and attributes responsible for unfairness are distributed among the input attributes and the hidden neurons. Lastly, we present a method which would allow us to extend our fairness mitigation approach to Large Language Models. As existing bias mitigation research typically do not apply in the era of LLMs, we propose a non-intrusive bias mitigation approach which does not require accessing or modifying the internals of LLMs. Specifically, we propose a parameter-efficient debias adapter that not only improves fairness systematically but also provides a theoretical statistical guarantee on the achieved fairness whenever feasible during debiasing.
format text
author ZHANG, Mengdi
author_facet ZHANG, Mengdi
author_sort ZHANG, Mengdi
title Towards explainable neural network fairness
title_short Towards explainable neural network fairness
title_full Towards explainable neural network fairness
title_fullStr Towards explainable neural network fairness
title_full_unstemmed Towards explainable neural network fairness
title_sort towards explainable neural network fairness
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/etd_coll/547
https://ink.library.smu.edu.sg/context/etd_coll/article/1545/viewcontent/GPIS_AY2019_PhD_Mengdi_Zhang.pdf
_version_ 1814047579985412096