Learning expensive coordination: An event-based deep RL approach

Existing works in deep Multi-Agent Reinforcement Learning (MARL) mainly focus on coordinating cooperative agents to complete certain tasks jointly. However, in many cases of the real world, agents are self-interested such as employees in a company and clubs in a league. Therefore, the leader, i.e.,...

Full description

Saved in:
Bibliographic Details
Main Authors: YU, Runsheng, WANG, Xinrun, WANG, Rundong, ZHANG, Youzhi, AN, Bo, SHI, Zhen Yu, LAI, Hanjiang
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2020
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9147
https://ink.library.smu.edu.sg/context/sis_research/article/10150/viewcontent/108_learning_expensive_coordination_av.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10150
record_format dspace
spelling sg-smu-ink.sis_research-101502024-08-01T09:18:46Z Learning expensive coordination: An event-based deep RL approach YU, Runsheng WANG, Xinrun WANG, Rundong ZHANG, Youzhi AN, Bo SHI, Zhen Yu LAI, Hanjiang Existing works in deep Multi-Agent Reinforcement Learning (MARL) mainly focus on coordinating cooperative agents to complete certain tasks jointly. However, in many cases of the real world, agents are self-interested such as employees in a company and clubs in a league. Therefore, the leader, i.e., the manager of the company or the league, needs to provide bonuses to followers for efficient coordination, which we call expensive coordination. The main difficulties of expensive coordination are that i) the leader has to consider the long-term effect and predict the followers’ behaviors when assigning bonuses, and ii) the complex interactions between followers make the training process hard to converge, especially when the leader’s policy changes with time. In this work, we address this problem through an event-based deep RL approach. Our main contributions are threefold. (1) We model the leader’s decision-making process as a semi-Markov Decision Process and propose a novel multi-agent event-based policy gradient to learn the leader’s long-term policy. (2) We exploit the leader-follower consistency scheme to design a follower-aware module and a follower-specific attention module to predict the followers’ behaviors and make accurate response to their behaviors. (3) We propose an action abstraction-based policy gradient algorithm to reduce the followers’ decision space and thus accelerate the training process of followers. Experiments in resource collections, navigation, and the predator-prey game reveal that our approach outperforms the state-of-the-art methods dramatically 2020-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9147 https://ink.library.smu.edu.sg/context/sis_research/article/10150/viewcontent/108_learning_expensive_coordination_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Artificial Intelligence and Robotics
spellingShingle Artificial Intelligence and Robotics
YU, Runsheng
WANG, Xinrun
WANG, Rundong
ZHANG, Youzhi
AN, Bo
SHI, Zhen Yu
LAI, Hanjiang
Learning expensive coordination: An event-based deep RL approach
description Existing works in deep Multi-Agent Reinforcement Learning (MARL) mainly focus on coordinating cooperative agents to complete certain tasks jointly. However, in many cases of the real world, agents are self-interested such as employees in a company and clubs in a league. Therefore, the leader, i.e., the manager of the company or the league, needs to provide bonuses to followers for efficient coordination, which we call expensive coordination. The main difficulties of expensive coordination are that i) the leader has to consider the long-term effect and predict the followers’ behaviors when assigning bonuses, and ii) the complex interactions between followers make the training process hard to converge, especially when the leader’s policy changes with time. In this work, we address this problem through an event-based deep RL approach. Our main contributions are threefold. (1) We model the leader’s decision-making process as a semi-Markov Decision Process and propose a novel multi-agent event-based policy gradient to learn the leader’s long-term policy. (2) We exploit the leader-follower consistency scheme to design a follower-aware module and a follower-specific attention module to predict the followers’ behaviors and make accurate response to their behaviors. (3) We propose an action abstraction-based policy gradient algorithm to reduce the followers’ decision space and thus accelerate the training process of followers. Experiments in resource collections, navigation, and the predator-prey game reveal that our approach outperforms the state-of-the-art methods dramatically
format text
author YU, Runsheng
WANG, Xinrun
WANG, Rundong
ZHANG, Youzhi
AN, Bo
SHI, Zhen Yu
LAI, Hanjiang
author_facet YU, Runsheng
WANG, Xinrun
WANG, Rundong
ZHANG, Youzhi
AN, Bo
SHI, Zhen Yu
LAI, Hanjiang
author_sort YU, Runsheng
title Learning expensive coordination: An event-based deep RL approach
title_short Learning expensive coordination: An event-based deep RL approach
title_full Learning expensive coordination: An event-based deep RL approach
title_fullStr Learning expensive coordination: An event-based deep RL approach
title_full_unstemmed Learning expensive coordination: An event-based deep RL approach
title_sort learning expensive coordination: an event-based deep rl approach
publisher Institutional Knowledge at Singapore Management University
publishDate 2020
url https://ink.library.smu.edu.sg/sis_research/9147
https://ink.library.smu.edu.sg/context/sis_research/article/10150/viewcontent/108_learning_expensive_coordination_av.pdf
_version_ 1814047755700535296