Robust multi-agent team behaviors in uncertain environment via reinforcement learning

Many state-of-the-art cooperative multi-agent reinforcement learning (MARL) approaches, such as MADDPG, COMA, and QMIX have focused mainly on performing well in idealized scenarios. Agents face similar environmental conditions and opponents encountered during training. The resulting policies are of...

Full description

Saved in:
Bibliographic Details
Main Author: Yan, Kok Hong
Other Authors: Bo An
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/159448
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-159448
record_format dspace
spelling sg-ntu-dr.10356-1594482022-06-20T02:50:02Z Robust multi-agent team behaviors in uncertain environment via reinforcement learning Yan, Kok Hong Bo An School of Computer Science and Engineering boan@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Many state-of-the-art cooperative multi-agent reinforcement learning (MARL) approaches, such as MADDPG, COMA, and QMIX have focused mainly on performing well in idealized scenarios. Agents face similar environmental conditions and opponents encountered during training. The resulting policies are often fragile and brittle from overfitting to the training environment. These policies cannot be easily deployed out of the laboratory. While adversarial learning is a way to train robust policies, many of these works have focused on single-agent RL and adversarial updates to the static environment. Some robust MARL works are designed based on adversarial training. These works have focused on specialized settings. M3DDPG focuses on an extreme setting in which all other agents are assumed to be adversarial. Phan et al. looked at the setting where agents malfunction and turn adversarial. Many of these works have compromised on team coordination to achieve robustness. There is little emphasis on maintaining good team coordination while ensuring robustness. This is an obvious gap where robustness should be part of the MARL algorithm design objectives besides performance, rather than an afterthought. This work focuses on learning robust team policy that would perform well even when the environment and opponent behaviour is significantly different from training. We propose the Signal-mediated Team Maxmin (STeaM) framework. STeaM is an end-to-end MARL framework that approximates the game-theoretic solution concept of team-maxmin equilibrium with a correlation device (TMECor), to address issues of agent coordination and policy robustness. STeaM uses a pre-agreed signal to coordinate team actions and approximate TMECor policies through consistency and diversity regularizations together with a best-response gradient descent self-play equilibrium learning procedure. Our experiments show that STeaM can learn team agent policies that approximate TMECor well. These policies can consistently achieve higher rewards in adversarial and uncertain situations over policies produced by other state-of-art models. The STeaM produced policies also exhibit bounded performance degradation when tested previously unseen policies. Master of Engineering 2022-06-20T02:50:02Z 2022-06-20T02:50:02Z 2022 Thesis-Master by Research Yan, K. H. (2022). Robust multi-agent team behaviors in uncertain environment via reinforcement learning. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/159448 https://hdl.handle.net/10356/159448 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Yan, Kok Hong
Robust multi-agent team behaviors in uncertain environment via reinforcement learning
description Many state-of-the-art cooperative multi-agent reinforcement learning (MARL) approaches, such as MADDPG, COMA, and QMIX have focused mainly on performing well in idealized scenarios. Agents face similar environmental conditions and opponents encountered during training. The resulting policies are often fragile and brittle from overfitting to the training environment. These policies cannot be easily deployed out of the laboratory. While adversarial learning is a way to train robust policies, many of these works have focused on single-agent RL and adversarial updates to the static environment. Some robust MARL works are designed based on adversarial training. These works have focused on specialized settings. M3DDPG focuses on an extreme setting in which all other agents are assumed to be adversarial. Phan et al. looked at the setting where agents malfunction and turn adversarial. Many of these works have compromised on team coordination to achieve robustness. There is little emphasis on maintaining good team coordination while ensuring robustness. This is an obvious gap where robustness should be part of the MARL algorithm design objectives besides performance, rather than an afterthought. This work focuses on learning robust team policy that would perform well even when the environment and opponent behaviour is significantly different from training. We propose the Signal-mediated Team Maxmin (STeaM) framework. STeaM is an end-to-end MARL framework that approximates the game-theoretic solution concept of team-maxmin equilibrium with a correlation device (TMECor), to address issues of agent coordination and policy robustness. STeaM uses a pre-agreed signal to coordinate team actions and approximate TMECor policies through consistency and diversity regularizations together with a best-response gradient descent self-play equilibrium learning procedure. Our experiments show that STeaM can learn team agent policies that approximate TMECor well. These policies can consistently achieve higher rewards in adversarial and uncertain situations over policies produced by other state-of-art models. The STeaM produced policies also exhibit bounded performance degradation when tested previously unseen policies.
author2 Bo An
author_facet Bo An
Yan, Kok Hong
format Thesis-Master by Research
author Yan, Kok Hong
author_sort Yan, Kok Hong
title Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_short Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_full Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_fullStr Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_full_unstemmed Robust multi-agent team behaviors in uncertain environment via reinforcement learning
title_sort robust multi-agent team behaviors in uncertain environment via reinforcement learning
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/159448
_version_ 1736856413454467072