Sequential decision learning for social good and fairness

Sequential decision learning is one of the key research areas in artificial intelligence. Typically, a sequence of events is observed through a transformation that introduces uncertainty into the observations and based on these observations, the recognition process produces a hypothesis of the under...

Full description

Saved in:

Bibliographic Details
Main Author:	LI, Dexun
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Sequential decision learning Fairness constraint Influence maximization RMAB Environment design Social good Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/etd_coll/628 https://ink.library.smu.edu.sg/context/etd_coll/article/1626/viewcontent/GPIS_AY2019_PhD_Li_Dexun.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.etd_coll-1626
record_format	dspace
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Sequential decision learning Fairness constraint Influence maximization RMAB Environment design Social good Databases and Information Systems
spellingShingle	Sequential decision learning Fairness constraint Influence maximization RMAB Environment design Social good Databases and Information Systems LI, Dexun Sequential decision learning for social good and fairness
description	Sequential decision learning is one of the key research areas in artificial intelligence. Typically, a sequence of events is observed through a transformation that introduces uncertainty into the observations and based on these observations, the recognition process produces a hypothesis of the underlying events. This learning process is characterized by maximizing the sum of the reward signals. However, many real-life problems are inherently constrained by limited resources. Besides, when the learning algorithms are used to inform decisions involving human beings (e.g., Security and justice, health intervention, etc), they may inherit the potential, pre-existing bias in the dataset and exhibit similar discrimination against protected attributes such as race and gender. Therefore, it is essential to ensure fairness constraints are met and budget constraints are not violated when applying sequential decision learning algorithms in real-world scenarios. In this dissertation, we focus on the practical problem of fair sequential decision learning that contributes to the social good, within settings of Restless Multi-Armed Bandits (RMAB) and Reinforcement Learning (RL). In particular, the dissertation is split into two major parts. In the first part of the work, we consider the RMAB setting. RMAB is an apt model to represent decision learning problems in public health interventions (e.g., tuberculosis, maternal, and child care), anti-poaching planning, sensor monitoring, personalized recommendations, and many more. In the context of public health settings, the problem is characterized by multiple arms (i.e., patients) whose state evolves in an uncertain manner (e.g., medication usage in the case of tuberculosis) and threads moving to "bad" states have to be steered to "good" outcomes through interventions. Due to the limited resources (e.g., public health workers), typically certain individuals, communities, or regions are starved of interventions, which can potentially have a significant negative impact on the individual/community in the long term. To that end, we argue the need to ensure fairness during decision-making (e.g., select arms/patients to give health interventions). We, therefore, combine recent advances in RMAB research with our proposed definition of fairness in the face of uncertainty to develop a scalable and efficient algorithm to learn a policy that can handle fairness constraints without sacrificing significant solution quality. We provide theoretical performance guarantees and validate our approaches on simulated benchmarks. In the second part of the thesis, we address the sequential decision learning in a reinforcement learning setting, starting with the problem of influence maximization in an unknown social network. The objective is to identify a set of peer leaders within a real-world physical social network who can disseminate information to a large group of people. This approach has found a wide range of applications, including HIV prevention, substance abuse prevention, micro-finance adoption, etc. Unlike online social networks, real-world networks are not completely known, and collecting information about the network is costly as it involves surveying multiple people. Specifically, we focus on the problem of the network discovery process for influence maximization with a limited budget (i.e., certain numbers of surveying). Because interactions with the environment in real-world settings are costly, it is crucial for reinforcement learning algorithms to have minimum possible environment interactions, i.e., to be sample efficient. To achieve this, we propose a curriculum-based approach that enhances the sample efficiency of existing RL methods. Our proposed algorithm has been demonstrated to outperform existing approaches in a sample-efficient manner. We further explore training generally capable RL agents in complex environments. Recent research has highlighted the potential of the Unsupervised Environment Design (UED), a framework that automatically generates a curriculum of training environments. Agents trained in these environments can develop general capabilities. Specifically, our focus lies on applying UED in scenarios where resources are limited, characterized by a limited number of generated environments and limited training horizons. To this end, we introduce a hierarchical MDP framework, which consists of an upper-level RL teacher agent tasked with generating suitable training environments for a lower-level student agent. The RL teacher can leverage previously discovered environment structures and generate challenging environments at the frontier of the student's capabilities by observing the representation of the student policy. We incorporate an additional fairness reward to accurately guide the environment generation process and leverage recent advances in generative models to minimize the costly collection of experiences required to train the teacher agent. Our proposed method significantly reduces the resource-intensive interactions between agents and environments, and empirical experiments across various domains demonstrate the effectiveness of our approach. Our research can lead to more principled, robust, and widely accepted systems that can be used to assist in training non-expert humans.
format	text
author	LI, Dexun
author_facet	LI, Dexun
author_sort	LI, Dexun
title	Sequential decision learning for social good and fairness
title_short	Sequential decision learning for social good and fairness
title_full	Sequential decision learning for social good and fairness
title_fullStr	Sequential decision learning for social good and fairness
title_full_unstemmed	Sequential decision learning for social good and fairness
title_sort	sequential decision learning for social good and fairness
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/etd_coll/628 https://ink.library.smu.edu.sg/context/etd_coll/article/1626/viewcontent/GPIS_AY2019_PhD_Li_Dexun.pdf
_version_	1814047834689765376
spelling	sg-smu-ink.etd_coll-16262024-09-03T07:45:33Z Sequential decision learning for social good and fairness LI, Dexun Sequential decision learning is one of the key research areas in artificial intelligence. Typically, a sequence of events is observed through a transformation that introduces uncertainty into the observations and based on these observations, the recognition process produces a hypothesis of the underlying events. This learning process is characterized by maximizing the sum of the reward signals. However, many real-life problems are inherently constrained by limited resources. Besides, when the learning algorithms are used to inform decisions involving human beings (e.g., Security and justice, health intervention, etc), they may inherit the potential, pre-existing bias in the dataset and exhibit similar discrimination against protected attributes such as race and gender. Therefore, it is essential to ensure fairness constraints are met and budget constraints are not violated when applying sequential decision learning algorithms in real-world scenarios. In this dissertation, we focus on the practical problem of fair sequential decision learning that contributes to the social good, within settings of Restless Multi-Armed Bandits (RMAB) and Reinforcement Learning (RL). In particular, the dissertation is split into two major parts. In the first part of the work, we consider the RMAB setting. RMAB is an apt model to represent decision learning problems in public health interventions (e.g., tuberculosis, maternal, and child care), anti-poaching planning, sensor monitoring, personalized recommendations, and many more. In the context of public health settings, the problem is characterized by multiple arms (i.e., patients) whose state evolves in an uncertain manner (e.g., medication usage in the case of tuberculosis) and threads moving to "bad" states have to be steered to "good" outcomes through interventions. Due to the limited resources (e.g., public health workers), typically certain individuals, communities, or regions are starved of interventions, which can potentially have a significant negative impact on the individual/community in the long term. To that end, we argue the need to ensure fairness during decision-making (e.g., select arms/patients to give health interventions). We, therefore, combine recent advances in RMAB research with our proposed definition of fairness in the face of uncertainty to develop a scalable and efficient algorithm to learn a policy that can handle fairness constraints without sacrificing significant solution quality. We provide theoretical performance guarantees and validate our approaches on simulated benchmarks. In the second part of the thesis, we address the sequential decision learning in a reinforcement learning setting, starting with the problem of influence maximization in an unknown social network. The objective is to identify a set of peer leaders within a real-world physical social network who can disseminate information to a large group of people. This approach has found a wide range of applications, including HIV prevention, substance abuse prevention, micro-finance adoption, etc. Unlike online social networks, real-world networks are not completely known, and collecting information about the network is costly as it involves surveying multiple people. Specifically, we focus on the problem of the network discovery process for influence maximization with a limited budget (i.e., certain numbers of surveying). Because interactions with the environment in real-world settings are costly, it is crucial for reinforcement learning algorithms to have minimum possible environment interactions, i.e., to be sample efficient. To achieve this, we propose a curriculum-based approach that enhances the sample efficiency of existing RL methods. Our proposed algorithm has been demonstrated to outperform existing approaches in a sample-efficient manner. We further explore training generally capable RL agents in complex environments. Recent research has highlighted the potential of the Unsupervised Environment Design (UED), a framework that automatically generates a curriculum of training environments. Agents trained in these environments can develop general capabilities. Specifically, our focus lies on applying UED in scenarios where resources are limited, characterized by a limited number of generated environments and limited training horizons. To this end, we introduce a hierarchical MDP framework, which consists of an upper-level RL teacher agent tasked with generating suitable training environments for a lower-level student agent. The RL teacher can leverage previously discovered environment structures and generate challenging environments at the frontier of the student's capabilities by observing the representation of the student policy. We incorporate an additional fairness reward to accurately guide the environment generation process and leverage recent advances in generative models to minimize the costly collection of experiences required to train the teacher agent. Our proposed method significantly reduces the resource-intensive interactions between agents and environments, and empirical experiments across various domains demonstrate the effectiveness of our approach. Our research can lead to more principled, robust, and widely accepted systems that can be used to assist in training non-expert humans. 2024-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/628 https://ink.library.smu.edu.sg/context/etd_coll/article/1626/viewcontent/GPIS_AY2019_PhD_Li_Dexun.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University Sequential decision learning Fairness constraint Influence maximization RMAB Environment design Social good Databases and Information Systems

Sequential decision learning for social good and fairness

Similar Items