Robust multi-agent team behaviors in uncertain environment via reinforcement learning

Many state-of-the-art cooperative multi-agent reinforcement learning (MARL) approaches, such as MADDPG, COMA, and QMIX have focused mainly on performing well in idealized scenarios. Agents face similar environmental conditions and opponents encountered during training. The resulting policies are of...

Full description

Saved in:

Bibliographic Details
Main Author:	Yan, Kok Hong
Other Authors:	Bo An
Format:	Thesis-Master by Research
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/159448
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-159448
record_format	dspace
spelling	sg-ntu-dr.10356-1594482022-06-20T02:50:02Z Robust multi-agent team behaviors in uncertain environment via reinforcement learning Yan, Kok Hong Bo An School of Computer Science and Engineering boan@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Many state-of-the-art cooperative multi-agent reinforcement learning (MARL) approaches, such as MADDPG, COMA, and QMIX have focused mainly on performing well in idealized scenarios. Agents face similar environmental conditions and opponents encountered during training. The resulting policies are often fragile and brittle from overfitting to the training environment. These policies cannot be easily deployed out of the laboratory. While adversarial learning is a way to train robust policies, many of these works have focused on single-agent RL and adversarial updates to the static environment. Some robust MARL works are designed based on adversarial training. These works have focused on specialized settings. M3DDPG focuses on an extreme setting in which all other agents are assumed to be adversarial. Phan et al. looked at the setting where agents malfunction and turn adversarial. Many of these works have compromised on team coordination to achieve robustness. There is little emphasis on maintaining good team coordination while ensuring robustness. This is an obvious gap where robustness should be part of the MARL algorithm design objectives besides performance, rather than an afterthought. This work focuses on learning robust team policy that would perform well even when the environment and opponent behaviour is significantly different from training. We propose the Signal-mediated Team Maxmin (STeaM) framework. STeaM is an end-to-end MARL framework that approximates the game-theoretic solution concept of team-maxmin equilibrium with a correlation device (TMECor), to address issues of agent coordination and policy robustness. STeaM uses a pre-agreed signal to coordinate team actions and approximate TMECor policies through consistency and diversity regularizations together with a best-response gradient descent self-play equilibrium learning procedure. Our experiments show that STeaM can learn team agent policies that approximate TMECor well. These policies can consistently achieve higher rewards in adversarial and uncertain situations over policies produced by other state-of-art models. The STeaM produced policies also exhibit bounded performance degradation when tested previously unseen policies. Master of Engineering 2022-06-20T02:50:02Z 2022-06-20T02:50:02Z 2022 Thesis-Master by Research Yan, K. H. (2022). Robust multi-agent team behaviors in uncertain environment via reinforcement learning. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/159448 https://hdl.handle.net/10356/159448 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Yan, Kok Hong Robust multi-agent team behaviors in uncertain environment via reinforcement learning
description	Many state-of-the-art cooperative multi-agent reinforcement learning (MARL) approaches, such as MADDPG, COMA, and QMIX have focused mainly on performing well in idealized scenarios. Agents face similar environmental conditions and opponents encountered during training. The resulting policies are often fragile and brittle from overfitting to the training environment. These policies cannot be easily deployed out of the laboratory. While adversarial learning is a way to train robust policies, many of these works have focused on single-agent RL and adversarial updates to the static environment. Some robust MARL works are designed based on adversarial training. These works have focused on specialized settings. M3DDPG focuses on an extreme setting in which all other agents are assumed to be adversarial. Phan et al. looked at the setting where agents malfunction and turn adversarial. Many of these works have compromised on team coordination to achieve robustness. There is little emphasis on maintaining good team coordination while ensuring robustness. This is an obvious gap where robustness should be part of the MARL algorithm design objectives besides performance, rather than an afterthought. This work focuses on learning robust team policy that would perform well even when the environment and opponent behaviour is significantly different from training. We propose the Signal-mediated Team Maxmin (STeaM) framework. STeaM is an end-to-end MARL framework that approximates the game-theoretic solution concept of team-maxmin equilibrium with a correlation device (TMECor), to address issues of agent coordination and policy robustness. STeaM uses a pre-agreed signal to coordinate team actions and approximate TMECor policies through consistency and diversity regularizations together with a best-response gradient descent self-play equilibrium learning procedure. Our experiments show that STeaM can learn team agent policies that approximate TMECor well. These policies can consistently achieve higher rewards in adversarial and uncertain situations over policies produced by other state-of-art models. The STeaM produced policies also exhibit bounded performance degradation when tested previously unseen policies.
author2	Bo An
author_facet	Bo An Yan, Kok Hong
format	Thesis-Master by Research
author	Yan, Kok Hong
author_sort	Yan, Kok Hong
title	Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_short	Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_full	Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_fullStr	Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_full_unstemmed	Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_sort	robust multi-agent team behaviors in uncertain environment via reinforcement learning
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/159448
_version_	1736856413454467072

Robust multi-agent team behaviors in uncertain environment via reinforcement learning

Similar Items