Credit assignment in multiagent reinforcement learning for large agent population

In the current age, rapid growth in sectors like finance, transportation etc., involve fast digitization of industrial processes. This creates a huge opportunity for next-generation artificial intelligence system with multiple agents operating at scale. Multiagent reinforcement learning (MARL) is th...

Full description

Saved in:
Bibliographic Details
Main Author: SINGH, Arambam James
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/364
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1362&context=etd_coll
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.etd_coll-1362
record_format dspace
spelling sg-smu-ink.etd_coll-13622022-02-28T03:31:20Z Credit assignment in multiagent reinforcement learning for large agent population SINGH, Arambam James In the current age, rapid growth in sectors like finance, transportation etc., involve fast digitization of industrial processes. This creates a huge opportunity for next-generation artificial intelligence system with multiple agents operating at scale. Multiagent reinforcement learning (MARL) is the field of study that addresses problems in the multiagent systems. In this thesis, we develop and evaluate novel MARL methodologies that address the challenges in large scale multiagent system with cooperative setting. One of the key challenge in cooperative MARL is the problem of credit assignment. Many of the previous approaches to the problem relies on agent's individual trajectory which makes scalability limited to small number of agents. Our proposed methodologies are solely based on aggregate information which provides the benefit of high scalability. The dimension of key statistics does not change with increasing agent population size. In this thesis we also address other challenges that arise in MARL such as variable duration action, and also some preliminary work on credit assignment with sparse reward model. The first part of this thesis investigates the challenges in a maritime traffic management (MTM) problem, one of the motivating domains for large scale cooperative multiagent systems. The key research question is how to coordinate vessels in a heavily trafficked maritime traffic environment to increase the safety of navigation by reducing traffic congestions. MTM problem is an instance of cooperative MARL with shared reward. Vessels share the same penalty cost for any congestions. Thus, it suffer from the credit assignment problem. We address it by developing a vessel-based value function using aggregate information, which performs effective credit assignment by computing the effectiveness of the agent’s policy by filtering out the contributions from other agents. Although this first approach achieved promising results, its ability to handle variable duration action is rather limited, which is a crucial feature of the problem domain. Thus, we address this challenge using hierarchical reinforcement learning, a framework for control with variable duration action. We develop a novel hierarchical learning based approach for the maritime traffic control problem. We introduce a notion of meta action a high level action that takes variable amount time to execute. We also propose an individual meta value function using aggregate information which effectively address the credit assignment problem. We also develop a general approach to address the credit assignment problem for a large scale cooperative multiagent system for both discrete and continuous actions settings. We extended a shaped reward approach known as difference rewards (DR) to address the credit assignment problem. DRs are an effective tool to tackle this problem, but their computation is known to be challenging even for small number of agents. We propose a scalable method to compute difference rewards based on the aggregate information. One limitation of this DR based approach for credit assignment is that it relies on learning a good approximation of reward model. But, in a sparse reward setting agents do not receive any informative immediate reward signal until the episode ends, so this shaped reward based approach is not effective in sparse reward case. In this thesis, we also propose some preliminary work in this direction. 2021-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/364 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1362&context=etd_coll http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University Multiagent Reinforcement Learning Artificial Intelligence and Robotics Operations Research, Systems Engineering and Industrial Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Multiagent Reinforcement Learning
Artificial Intelligence and Robotics
Operations Research, Systems Engineering and Industrial Engineering
spellingShingle Multiagent Reinforcement Learning
Artificial Intelligence and Robotics
Operations Research, Systems Engineering and Industrial Engineering
SINGH, Arambam James
Credit assignment in multiagent reinforcement learning for large agent population
description In the current age, rapid growth in sectors like finance, transportation etc., involve fast digitization of industrial processes. This creates a huge opportunity for next-generation artificial intelligence system with multiple agents operating at scale. Multiagent reinforcement learning (MARL) is the field of study that addresses problems in the multiagent systems. In this thesis, we develop and evaluate novel MARL methodologies that address the challenges in large scale multiagent system with cooperative setting. One of the key challenge in cooperative MARL is the problem of credit assignment. Many of the previous approaches to the problem relies on agent's individual trajectory which makes scalability limited to small number of agents. Our proposed methodologies are solely based on aggregate information which provides the benefit of high scalability. The dimension of key statistics does not change with increasing agent population size. In this thesis we also address other challenges that arise in MARL such as variable duration action, and also some preliminary work on credit assignment with sparse reward model. The first part of this thesis investigates the challenges in a maritime traffic management (MTM) problem, one of the motivating domains for large scale cooperative multiagent systems. The key research question is how to coordinate vessels in a heavily trafficked maritime traffic environment to increase the safety of navigation by reducing traffic congestions. MTM problem is an instance of cooperative MARL with shared reward. Vessels share the same penalty cost for any congestions. Thus, it suffer from the credit assignment problem. We address it by developing a vessel-based value function using aggregate information, which performs effective credit assignment by computing the effectiveness of the agent’s policy by filtering out the contributions from other agents. Although this first approach achieved promising results, its ability to handle variable duration action is rather limited, which is a crucial feature of the problem domain. Thus, we address this challenge using hierarchical reinforcement learning, a framework for control with variable duration action. We develop a novel hierarchical learning based approach for the maritime traffic control problem. We introduce a notion of meta action a high level action that takes variable amount time to execute. We also propose an individual meta value function using aggregate information which effectively address the credit assignment problem. We also develop a general approach to address the credit assignment problem for a large scale cooperative multiagent system for both discrete and continuous actions settings. We extended a shaped reward approach known as difference rewards (DR) to address the credit assignment problem. DRs are an effective tool to tackle this problem, but their computation is known to be challenging even for small number of agents. We propose a scalable method to compute difference rewards based on the aggregate information. One limitation of this DR based approach for credit assignment is that it relies on learning a good approximation of reward model. But, in a sparse reward setting agents do not receive any informative immediate reward signal until the episode ends, so this shaped reward based approach is not effective in sparse reward case. In this thesis, we also propose some preliminary work in this direction.
format text
author SINGH, Arambam James
author_facet SINGH, Arambam James
author_sort SINGH, Arambam James
title Credit assignment in multiagent reinforcement learning for large agent population
title_short Credit assignment in multiagent reinforcement learning for large agent population
title_full Credit assignment in multiagent reinforcement learning for large agent population
title_fullStr Credit assignment in multiagent reinforcement learning for large agent population
title_full_unstemmed Credit assignment in multiagent reinforcement learning for large agent population
title_sort credit assignment in multiagent reinforcement learning for large agent population
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/etd_coll/364
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1362&context=etd_coll
_version_ 1745575011628875776