Reinforcement learning for collective multi-agent decision making

In this thesis, we study reinforcement learning algorithms to collectively optimize decentralized policy in a large population of autonomous agents. We notice one of the main bottlenecks in large multi-agent system is the size of the joint trajectory of agents which quickly increases with the number...

Full description

Saved in:
Bibliographic Details
Main Author: NGUYEN, Duc Thien
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/162
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1162&context=etd_coll
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.etd_coll-1162
record_format dspace
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Multi-agent System
Reinforcement Learning
Decentralized POMDP
OS and Networks
Theory and Algorithms
spellingShingle Multi-agent System
Reinforcement Learning
Decentralized POMDP
OS and Networks
Theory and Algorithms
NGUYEN, Duc Thien
Reinforcement learning for collective multi-agent decision making
description In this thesis, we study reinforcement learning algorithms to collectively optimize decentralized policy in a large population of autonomous agents. We notice one of the main bottlenecks in large multi-agent system is the size of the joint trajectory of agents which quickly increases with the number of participating agents. Furthermore, the noiseof actions concurrently executed by different agents in a large system makes it difficult for each agent to estimate the value of its own actions, which is well-known as the multi-agent credit assignment problem. We propose a compact representation for multi-agent systems using the aggregate counts to address the high complexity of joint state-action and novel reinforcement learning algorithms based on value function decomposition to address the multi-agent credit assignment problem as follows: 1. Collective Representation: In many real-world systems such as urban traffic networks, the joint-reward and environment dynamics depend on only the number of agents (the count) involved in interactions rather than agent identity. We formulate this sub-class of multi-agent systems as a Collective Decentralized Partially Observable Markov Decision Process (CDEC-POMDP). We show that in CDEC-POMDP, the transition counts, which summarize the numbers of agents taking different local actions and transiting from their current local states to new local states, are sufficient-statistics for learning/optimizing the decentralized policy. Furthermore, the dimensions of the count variables are not affected by the population size. This allows us to transform the original planning problems to optimize the complex joint agent trajectory into optimizing compact count variables. In addition, samples of the counts can be efficiently obtained with multinomial distributions, which provide a faster way to simulate the multi-agent systems and evaluate the planning policy. 2. Collective Multi-agent Reinforcement Learning (MRL): Firstly, to address multi-agent credit assignment problem in {\cmdp}, we propose the collective decomposition principle in designing value function approximation and decentralized policy update. Under this principle, the decentralized policy of each agent is updated using an individualized value instead of a joint global value. We formulate a joint update for policies of all agents using the counts, which is much more scalable than independent policy update with joint trajectory. Secondly, based on the collective decomposition principle, we design 2 classes of MRL algorithms for domains with local rewards and for domains with global rewards respectively. i) When the reward is decomposable into local rewards among agents, by exploiting exchangeability in CDEC-POMDPs we propose a mechanism to estimate the individual value function by using the sampled values of the counts and average individual rewards. We use this count-based individual value function to derive a new actor critic algorithm called fAfC to learn effective individual policy for agents. ii) When the reward is non-decomposable, the system performance is evaluated by a single global value function instead of individual value functions. To follow the decomposition principle, we show how to estimate individual contribution value of agents using partial differentials of the joint value function with respect to the state-action counts. This is the basis for us to develop two algorithms called MCAC and CCAC to optimize individual policy under non-decomposable reward domains. Experimentally, we show the superiority of our proposed collective MRL algorithms in various testing domains: a real-world taxi supply-demand matching domain, a police patrolling game and a synthetic robot navigation domain, with population size up to 8000. They converge faster convergence and provide better solutions than other algorithms in the literature, i.e. average-flow based algorithms and standard actor critic algorithm.
format text
author NGUYEN, Duc Thien
author_facet NGUYEN, Duc Thien
author_sort NGUYEN, Duc Thien
title Reinforcement learning for collective multi-agent decision making
title_short Reinforcement learning for collective multi-agent decision making
title_full Reinforcement learning for collective multi-agent decision making
title_fullStr Reinforcement learning for collective multi-agent decision making
title_full_unstemmed Reinforcement learning for collective multi-agent decision making
title_sort reinforcement learning for collective multi-agent decision making
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/etd_coll/162
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1162&context=etd_coll
_version_ 1712300914684985344
spelling sg-smu-ink.etd_coll-11622019-04-10T03:25:31Z Reinforcement learning for collective multi-agent decision making NGUYEN, Duc Thien In this thesis, we study reinforcement learning algorithms to collectively optimize decentralized policy in a large population of autonomous agents. We notice one of the main bottlenecks in large multi-agent system is the size of the joint trajectory of agents which quickly increases with the number of participating agents. Furthermore, the noiseof actions concurrently executed by different agents in a large system makes it difficult for each agent to estimate the value of its own actions, which is well-known as the multi-agent credit assignment problem. We propose a compact representation for multi-agent systems using the aggregate counts to address the high complexity of joint state-action and novel reinforcement learning algorithms based on value function decomposition to address the multi-agent credit assignment problem as follows: 1. Collective Representation: In many real-world systems such as urban traffic networks, the joint-reward and environment dynamics depend on only the number of agents (the count) involved in interactions rather than agent identity. We formulate this sub-class of multi-agent systems as a Collective Decentralized Partially Observable Markov Decision Process (CDEC-POMDP). We show that in CDEC-POMDP, the transition counts, which summarize the numbers of agents taking different local actions and transiting from their current local states to new local states, are sufficient-statistics for learning/optimizing the decentralized policy. Furthermore, the dimensions of the count variables are not affected by the population size. This allows us to transform the original planning problems to optimize the complex joint agent trajectory into optimizing compact count variables. In addition, samples of the counts can be efficiently obtained with multinomial distributions, which provide a faster way to simulate the multi-agent systems and evaluate the planning policy. 2. Collective Multi-agent Reinforcement Learning (MRL): Firstly, to address multi-agent credit assignment problem in {\cmdp}, we propose the collective decomposition principle in designing value function approximation and decentralized policy update. Under this principle, the decentralized policy of each agent is updated using an individualized value instead of a joint global value. We formulate a joint update for policies of all agents using the counts, which is much more scalable than independent policy update with joint trajectory. Secondly, based on the collective decomposition principle, we design 2 classes of MRL algorithms for domains with local rewards and for domains with global rewards respectively. i) When the reward is decomposable into local rewards among agents, by exploiting exchangeability in CDEC-POMDPs we propose a mechanism to estimate the individual value function by using the sampled values of the counts and average individual rewards. We use this count-based individual value function to derive a new actor critic algorithm called fAfC to learn effective individual policy for agents. ii) When the reward is non-decomposable, the system performance is evaluated by a single global value function instead of individual value functions. To follow the decomposition principle, we show how to estimate individual contribution value of agents using partial differentials of the joint value function with respect to the state-action counts. This is the basis for us to develop two algorithms called MCAC and CCAC to optimize individual policy under non-decomposable reward domains. Experimentally, we show the superiority of our proposed collective MRL algorithms in various testing domains: a real-world taxi supply-demand matching domain, a police patrolling game and a synthetic robot navigation domain, with population size up to 8000. They converge faster convergence and provide better solutions than other algorithms in the literature, i.e. average-flow based algorithms and standard actor critic algorithm. 2018-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/162 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1162&context=etd_coll http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University Multi-agent System Reinforcement Learning Decentralized POMDP OS and Networks Theory and Algorithms