Towards explainable neural network fairness

Neural networks are widely applied in solving many real-world problems. At the same time, they are shown to be vulnerable to attacks, difficult to debug, non-transparent and subject to fairness issues. Discrimination has been observed in various machine learning models, including Large Language Mode...

Full description

Saved in:

Bibliographic Details
Main Author:	ZHANG, Mengdi
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	AI Fairness Discrimination Fairness Improvement Bias Mitigation AI Testing Computer Sciences OS and Networks
Online Access:	https://ink.library.smu.edu.sg/etd_coll/547 https://ink.library.smu.edu.sg/context/etd_coll/article/1545/viewcontent/GPIS_AY2019_PhD_Mengdi_Zhang.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.etd_coll-1545
record_format	dspace
spelling	sg-smu-ink.etd_coll-15452024-06-20T01:50:07Z Towards explainable neural network fairness ZHANG, Mengdi Neural networks are widely applied in solving many real-world problems. At the same time, they are shown to be vulnerable to attacks, difficult to debug, non-transparent and subject to fairness issues. Discrimination has been observed in various machine learning models, including Large Language Models (LLMs), which calls for systematic fairness evaluation (i.e., testing, verification or even certification) before their deployment in ethic-relevant domains. If a model is found to be discriminating, we must apply systematic measure to improve its fairness. In the literature, multiple categories of fairness improving methods have been discussed, including pre-processing, in-processing and post-processing. In this dissertation, we aim to develop methods which identity fairness issues in neural networks and mitigate discrimination in a systematic explainable way. To achieving this goal, we start with developing a method of explaining how a neural network makes decisions based on simple rules. One factor contributing to fairness concerns is the inherent black-box nature of neural networks. This makes it challenging to discern the rationale behind specific decisions, potentially resulting in biased outcomes. So, in the first work, we focus on explaining neural networks using rules which are not only accurate but also provide insights into the underlying decision-making process. We provide two measurements for neural network decision explainability, and develop automated evaluation algorithms. In the second research work, we apply the rule-based idea to identify fairness issues that can be explained. We notice that group discrimination is mostly hidden and less studied. Therefore, we propose \sgde, an interpretable testing approach which systematically identifies and measures hidden group discrimination of a neural network characterized by an interpretable rule set which indicates conditions over combinations of the sensitive features. Specifically, given a neural network, \sgd first automatically generates a rule set and then provides an estimated group fairness score to measure the degree of the identifies subtle group discrimination with theoretical error bounds. In the third research work, we design an approach which explore the causes of fairness issue and mitigate them systematically. Specifically, we first apply and empirical study which shows that existing fairness improvement methods are not always effective (e.g., they may improve fairness by paying the price of huge accuracy drop) or even not helpful (e.g., they may even worsen both fairness and accuracy). Then, we propose an approach which adaptively chooses the fairness improving method based on causality analysis. That is, we choose the method based on how the neurons and attributes responsible for unfairness are distributed among the input attributes and the hidden neurons. Lastly, we present a method which would allow us to extend our fairness mitigation approach to Large Language Models. As existing bias mitigation research typically do not apply in the era of LLMs, we propose a non-intrusive bias mitigation approach which does not require accessing or modifying the internals of LLMs. Specifically, we propose a parameter-efficient debias adapter that not only improves fairness systematically but also provides a theoretical statistical guarantee on the achieved fairness whenever feasible during debiasing. 2024-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/547 https://ink.library.smu.edu.sg/context/etd_coll/article/1545/viewcontent/GPIS_AY2019_PhD_Mengdi_Zhang.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University AI Fairness Discrimination Fairness Improvement Bias Mitigation AI Testing Computer Sciences OS and Networks
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	AI Fairness Discrimination Fairness Improvement Bias Mitigation AI Testing Computer Sciences OS and Networks
spellingShingle	AI Fairness Discrimination Fairness Improvement Bias Mitigation AI Testing Computer Sciences OS and Networks ZHANG, Mengdi Towards explainable neural network fairness
description	Neural networks are widely applied in solving many real-world problems. At the same time, they are shown to be vulnerable to attacks, difficult to debug, non-transparent and subject to fairness issues. Discrimination has been observed in various machine learning models, including Large Language Models (LLMs), which calls for systematic fairness evaluation (i.e., testing, verification or even certification) before their deployment in ethic-relevant domains. If a model is found to be discriminating, we must apply systematic measure to improve its fairness. In the literature, multiple categories of fairness improving methods have been discussed, including pre-processing, in-processing and post-processing. In this dissertation, we aim to develop methods which identity fairness issues in neural networks and mitigate discrimination in a systematic explainable way. To achieving this goal, we start with developing a method of explaining how a neural network makes decisions based on simple rules. One factor contributing to fairness concerns is the inherent black-box nature of neural networks. This makes it challenging to discern the rationale behind specific decisions, potentially resulting in biased outcomes. So, in the first work, we focus on explaining neural networks using rules which are not only accurate but also provide insights into the underlying decision-making process. We provide two measurements for neural network decision explainability, and develop automated evaluation algorithms. In the second research work, we apply the rule-based idea to identify fairness issues that can be explained. We notice that group discrimination is mostly hidden and less studied. Therefore, we propose \sgde, an interpretable testing approach which systematically identifies and measures hidden group discrimination of a neural network characterized by an interpretable rule set which indicates conditions over combinations of the sensitive features. Specifically, given a neural network, \sgd first automatically generates a rule set and then provides an estimated group fairness score to measure the degree of the identifies subtle group discrimination with theoretical error bounds. In the third research work, we design an approach which explore the causes of fairness issue and mitigate them systematically. Specifically, we first apply and empirical study which shows that existing fairness improvement methods are not always effective (e.g., they may improve fairness by paying the price of huge accuracy drop) or even not helpful (e.g., they may even worsen both fairness and accuracy). Then, we propose an approach which adaptively chooses the fairness improving method based on causality analysis. That is, we choose the method based on how the neurons and attributes responsible for unfairness are distributed among the input attributes and the hidden neurons. Lastly, we present a method which would allow us to extend our fairness mitigation approach to Large Language Models. As existing bias mitigation research typically do not apply in the era of LLMs, we propose a non-intrusive bias mitigation approach which does not require accessing or modifying the internals of LLMs. Specifically, we propose a parameter-efficient debias adapter that not only improves fairness systematically but also provides a theoretical statistical guarantee on the achieved fairness whenever feasible during debiasing.
format	text
author	ZHANG, Mengdi
author_facet	ZHANG, Mengdi
author_sort	ZHANG, Mengdi
title	Towards explainable neural network fairness
title_short	Towards explainable neural network fairness
title_full	Towards explainable neural network fairness
title_fullStr	Towards explainable neural network fairness
title_full_unstemmed	Towards explainable neural network fairness
title_sort	towards explainable neural network fairness
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/etd_coll/547 https://ink.library.smu.edu.sg/context/etd_coll/article/1545/viewcontent/GPIS_AY2019_PhD_Mengdi_Zhang.pdf
_version_	1814047579985412096

Towards explainable neural network fairness

Similar Items