Reinforcement learning for collective multi-agent decision making

In this thesis, we study reinforcement learning algorithms to collectively optimize decentralized policy in a large population of autonomous agents. We notice one of the main bottlenecks in large multi-agent system is the size of the joint trajectory of agents which quickly increases with the number...

Full description

Saved in:

Bibliographic Details
Main Author:	NGUYEN, Duc Thien
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2018
Subjects:	Multi-agent System Reinforcement Learning Decentralized POMDP OS and Networks Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/etd_coll/162 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1162&context=etd_coll
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.etd_coll-1162
record_format	dspace
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Multi-agent System Reinforcement Learning Decentralized POMDP OS and Networks Theory and Algorithms
spellingShingle	Multi-agent System Reinforcement Learning Decentralized POMDP OS and Networks Theory and Algorithms NGUYEN, Duc Thien Reinforcement learning for collective multi-agent decision making
description	In this thesis, we study reinforcement learning algorithms to collectively optimize decentralized policy in a large population of autonomous agents. We notice one of the main bottlenecks in large multi-agent system is the size of the joint trajectory of agents which quickly increases with the number of participating agents. Furthermore, the noiseof actions concurrently executed by different agents in a large system makes it difficult for each agent to estimate the value of its own actions, which is well-known as the multi-agent credit assignment problem. We propose a compact representation for multi-agent systems using the aggregate counts to address the high complexity of joint state-action and novel reinforcement learning algorithms based on value function decomposition to address the multi-agent credit assignment problem as follows: 1. Collective Representation: In many real-world systems such as urban traffic networks, the joint-reward and environment dynamics depend on only the number of agents (the count) involved in interactions rather than agent identity. We formulate this sub-class of multi-agent systems as a Collective Decentralized Partially Observable Markov Decision Process (CDEC-POMDP). We show that in CDEC-POMDP, the transition counts, which summarize the numbers of agents taking different local actions and transiting from their current local states to new local states, are sufficient-statistics for learning/optimizing the decentralized policy. Furthermore, the dimensions of the count variables are not affected by the population size. This allows us to transform the original planning problems to optimize the complex joint agent trajectory into optimizing compact count variables. In addition, samples of the counts can be efficiently obtained with multinomial distributions, which provide a faster way to simulate the multi-agent systems and evaluate the planning policy. 2. Collective Multi-agent Reinforcement Learning (MRL): Firstly, to address multi-agent credit assignment problem in {\cmdp}, we propose the collective decomposition principle in designing value function approximation and decentralized policy update. Under this principle, the decentralized policy of each agent is updated using an individualized value instead of a joint global value. We formulate a joint update for policies of all agents using the counts, which is much more scalable than independent policy update with joint trajectory. Secondly, based on the collective decomposition principle, we design 2 classes of MRL algorithms for domains with local rewards and for domains with global rewards respectively. i) When the reward is decomposable into local rewards among agents, by exploiting exchangeability in CDEC-POMDPs we propose a mechanism to estimate the individual value function by using the sampled values of the counts and average individual rewards. We use this count-based individual value function to derive a new actor critic algorithm called fAfC to learn effective individual policy for agents. ii) When the reward is non-decomposable, the system performance is evaluated by a single global value function instead of individual value functions. To follow the decomposition principle, we show how to estimate individual contribution value of agents using partial differentials of the joint value function with respect to the state-action counts. This is the basis for us to develop two algorithms called MCAC and CCAC to optimize individual policy under non-decomposable reward domains. Experimentally, we show the superiority of our proposed collective MRL algorithms in various testing domains: a real-world taxi supply-demand matching domain, a police patrolling game and a synthetic robot navigation domain, with population size up to 8000. They converge faster convergence and provide better solutions than other algorithms in the literature, i.e. average-flow based algorithms and standard actor critic algorithm.
format	text
author	NGUYEN, Duc Thien
author_facet	NGUYEN, Duc Thien
author_sort	NGUYEN, Duc Thien
title	Reinforcement learning for collective multi-agent decision making
title_short	Reinforcement learning for collective multi-agent decision making
title_full	Reinforcement learning for collective multi-agent decision making
title_fullStr	Reinforcement learning for collective multi-agent decision making
title_full_unstemmed	Reinforcement learning for collective multi-agent decision making
title_sort	reinforcement learning for collective multi-agent decision making
publisher	Institutional Knowledge at Singapore Management University
publishDate	2018
url	https://ink.library.smu.edu.sg/etd_coll/162 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1162&context=etd_coll
_version_	1712300914684985344
spelling	sg-smu-ink.etd_coll-11622019-04-10T03:25:31Z Reinforcement learning for collective multi-agent decision making NGUYEN, Duc Thien In this thesis, we study reinforcement learning algorithms to collectively optimize decentralized policy in a large population of autonomous agents. We notice one of the main bottlenecks in large multi-agent system is the size of the joint trajectory of agents which quickly increases with the number of participating agents. Furthermore, the noiseof actions concurrently executed by different agents in a large system makes it difficult for each agent to estimate the value of its own actions, which is well-known as the multi-agent credit assignment problem. We propose a compact representation for multi-agent systems using the aggregate counts to address the high complexity of joint state-action and novel reinforcement learning algorithms based on value function decomposition to address the multi-agent credit assignment problem as follows: 1. Collective Representation: In many real-world systems such as urban traffic networks, the joint-reward and environment dynamics depend on only the number of agents (the count) involved in interactions rather than agent identity. We formulate this sub-class of multi-agent systems as a Collective Decentralized Partially Observable Markov Decision Process (CDEC-POMDP). We show that in CDEC-POMDP, the transition counts, which summarize the numbers of agents taking different local actions and transiting from their current local states to new local states, are sufficient-statistics for learning/optimizing the decentralized policy. Furthermore, the dimensions of the count variables are not affected by the population size. This allows us to transform the original planning problems to optimize the complex joint agent trajectory into optimizing compact count variables. In addition, samples of the counts can be efficiently obtained with multinomial distributions, which provide a faster way to simulate the multi-agent systems and evaluate the planning policy. 2. Collective Multi-agent Reinforcement Learning (MRL): Firstly, to address multi-agent credit assignment problem in {\cmdp}, we propose the collective decomposition principle in designing value function approximation and decentralized policy update. Under this principle, the decentralized policy of each agent is updated using an individualized value instead of a joint global value. We formulate a joint update for policies of all agents using the counts, which is much more scalable than independent policy update with joint trajectory. Secondly, based on the collective decomposition principle, we design 2 classes of MRL algorithms for domains with local rewards and for domains with global rewards respectively. i) When the reward is decomposable into local rewards among agents, by exploiting exchangeability in CDEC-POMDPs we propose a mechanism to estimate the individual value function by using the sampled values of the counts and average individual rewards. We use this count-based individual value function to derive a new actor critic algorithm called fAfC to learn effective individual policy for agents. ii) When the reward is non-decomposable, the system performance is evaluated by a single global value function instead of individual value functions. To follow the decomposition principle, we show how to estimate individual contribution value of agents using partial differentials of the joint value function with respect to the state-action counts. This is the basis for us to develop two algorithms called MCAC and CCAC to optimize individual policy under non-decomposable reward domains. Experimentally, we show the superiority of our proposed collective MRL algorithms in various testing domains: a real-world taxi supply-demand matching domain, a police patrolling game and a synthetic robot navigation domain, with population size up to 8000. They converge faster convergence and provide better solutions than other algorithms in the literature, i.e. average-flow based algorithms and standard actor critic algorithm. 2018-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/162 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1162&context=etd_coll http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University Multi-agent System Reinforcement Learning Decentralized POMDP OS and Networks Theory and Algorithms

Reinforcement learning for collective multi-agent decision making

Similar Items