Sequential decision learning for social good and fairness

Sequential decision learning is one of the key research areas in artificial intelligence. Typically, a sequence of events is observed through a transformation that introduces uncertainty into the observations and based on these observations, the recognition process produces a hypothesis of the under...

Full description

Saved in:
Bibliographic Details
Main Author: LI, Dexun
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/628
https://ink.library.smu.edu.sg/context/etd_coll/article/1626/viewcontent/GPIS_AY2019_PhD_Li_Dexun.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Sequential decision learning is one of the key research areas in artificial intelligence. Typically, a sequence of events is observed through a transformation that introduces uncertainty into the observations and based on these observations, the recognition process produces a hypothesis of the underlying events. This learning process is characterized by maximizing the sum of the reward signals. However, many real-life problems are inherently constrained by limited resources. Besides, when the learning algorithms are used to inform decisions involving human beings (e.g., Security and justice, health intervention, etc), they may inherit the potential, pre-existing bias in the dataset and exhibit similar discrimination against protected attributes such as race and gender. Therefore, it is essential to ensure fairness constraints are met and budget constraints are not violated when applying sequential decision learning algorithms in real-world scenarios. In this dissertation, we focus on the practical problem of fair sequential decision learning that contributes to the social good, within settings of Restless Multi-Armed Bandits (RMAB) and Reinforcement Learning (RL). In particular, the dissertation is split into two major parts. In the first part of the work, we consider the RMAB setting. RMAB is an apt model to represent decision learning problems in public health interventions (e.g., tuberculosis, maternal, and child care), anti-poaching planning, sensor monitoring, personalized recommendations, and many more. In the context of public health settings, the problem is characterized by multiple arms (i.e., patients) whose state evolves in an uncertain manner (e.g., medication usage in the case of tuberculosis) and threads moving to "bad" states have to be steered to "good" outcomes through interventions. Due to the limited resources (e.g., public health workers), typically certain individuals, communities, or regions are starved of interventions, which can potentially have a significant negative impact on the individual/community in the long term. To that end, we argue the need to ensure fairness during decision-making (e.g., select arms/patients to give health interventions). We, therefore, combine recent advances in RMAB research with our proposed definition of fairness in the face of uncertainty to develop a scalable and efficient algorithm to learn a policy that can handle fairness constraints without sacrificing significant solution quality. We provide theoretical performance guarantees and validate our approaches on simulated benchmarks. In the second part of the thesis, we address the sequential decision learning in a reinforcement learning setting, starting with the problem of influence maximization in an unknown social network. The objective is to identify a set of peer leaders within a real-world physical social network who can disseminate information to a large group of people. This approach has found a wide range of applications, including HIV prevention, substance abuse prevention, micro-finance adoption, etc. Unlike online social networks, real-world networks are not completely known, and collecting information about the network is costly as it involves surveying multiple people. Specifically, we focus on the problem of the network discovery process for influence maximization with a limited budget (i.e., certain numbers of surveying). Because interactions with the environment in real-world settings are costly, it is crucial for reinforcement learning algorithms to have minimum possible environment interactions, i.e., to be sample efficient. To achieve this, we propose a curriculum-based approach that enhances the sample efficiency of existing RL methods. Our proposed algorithm has been demonstrated to outperform existing approaches in a sample-efficient manner. We further explore training generally capable RL agents in complex environments. Recent research has highlighted the potential of the Unsupervised Environment Design (UED), a framework that automatically generates a curriculum of training environments. Agents trained in these environments can develop general capabilities. Specifically, our focus lies on applying UED in scenarios where resources are limited, characterized by a limited number of generated environments and limited training horizons. To this end, we introduce a hierarchical MDP framework, which consists of an upper-level RL teacher agent tasked with generating suitable training environments for a lower-level student agent. The RL teacher can leverage previously discovered environment structures and generate challenging environments at the frontier of the student's capabilities by observing the representation of the student policy. We incorporate an additional fairness reward to accurately guide the environment generation process and leverage recent advances in generative models to minimize the costly collection of experiences required to train the teacher agent. Our proposed method significantly reduces the resource-intensive interactions between agents and environments, and empirical experiments across various domains demonstrate the effectiveness of our approach. Our research can lead to more principled, robust, and widely accepted systems that can be used to assist in training non-expert humans.