Learning expensive coordination: An event-based deep RL approach
Existing works in deep Multi-Agent Reinforcement Learning (MARL) mainly focus on coordinating cooperative agents to complete certain tasks jointly. However, in many cases of the real world, agents are self-interested such as employees in a company and clubs in a league. Therefore, the leader, i.e.,...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2020
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9147 https://ink.library.smu.edu.sg/context/sis_research/article/10150/viewcontent/108_learning_expensive_coordination_av.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-10150 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-101502024-08-01T09:18:46Z Learning expensive coordination: An event-based deep RL approach YU, Runsheng WANG, Xinrun WANG, Rundong ZHANG, Youzhi AN, Bo SHI, Zhen Yu LAI, Hanjiang Existing works in deep Multi-Agent Reinforcement Learning (MARL) mainly focus on coordinating cooperative agents to complete certain tasks jointly. However, in many cases of the real world, agents are self-interested such as employees in a company and clubs in a league. Therefore, the leader, i.e., the manager of the company or the league, needs to provide bonuses to followers for efficient coordination, which we call expensive coordination. The main difficulties of expensive coordination are that i) the leader has to consider the long-term effect and predict the followers’ behaviors when assigning bonuses, and ii) the complex interactions between followers make the training process hard to converge, especially when the leader’s policy changes with time. In this work, we address this problem through an event-based deep RL approach. Our main contributions are threefold. (1) We model the leader’s decision-making process as a semi-Markov Decision Process and propose a novel multi-agent event-based policy gradient to learn the leader’s long-term policy. (2) We exploit the leader-follower consistency scheme to design a follower-aware module and a follower-specific attention module to predict the followers’ behaviors and make accurate response to their behaviors. (3) We propose an action abstraction-based policy gradient algorithm to reduce the followers’ decision space and thus accelerate the training process of followers. Experiments in resource collections, navigation, and the predator-prey game reveal that our approach outperforms the state-of-the-art methods dramatically 2020-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9147 https://ink.library.smu.edu.sg/context/sis_research/article/10150/viewcontent/108_learning_expensive_coordination_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Artificial Intelligence and Robotics |
spellingShingle |
Artificial Intelligence and Robotics YU, Runsheng WANG, Xinrun WANG, Rundong ZHANG, Youzhi AN, Bo SHI, Zhen Yu LAI, Hanjiang Learning expensive coordination: An event-based deep RL approach |
description |
Existing works in deep Multi-Agent Reinforcement Learning (MARL) mainly focus on coordinating cooperative agents to complete certain tasks jointly. However, in many cases of the real world, agents are self-interested such as employees in a company and clubs in a league. Therefore, the leader, i.e., the manager of the company or the league, needs to provide bonuses to followers for efficient coordination, which we call expensive coordination. The main difficulties of expensive coordination are that i) the leader has to consider the long-term effect and predict the followers’ behaviors when assigning bonuses, and ii) the complex interactions between followers make the training process hard to converge, especially when the leader’s policy changes with time. In this work, we address this problem through an event-based deep RL approach. Our main contributions are threefold. (1) We model the leader’s decision-making process as a semi-Markov Decision Process and propose a novel multi-agent event-based policy gradient to learn the leader’s long-term policy. (2) We exploit the leader-follower consistency scheme to design a follower-aware module and a follower-specific attention module to predict the followers’ behaviors and make accurate response to their behaviors. (3) We propose an action abstraction-based policy gradient algorithm to reduce the followers’ decision space and thus accelerate the training process of followers. Experiments in resource collections, navigation, and the predator-prey game reveal that our approach outperforms the state-of-the-art methods dramatically |
format |
text |
author |
YU, Runsheng WANG, Xinrun WANG, Rundong ZHANG, Youzhi AN, Bo SHI, Zhen Yu LAI, Hanjiang |
author_facet |
YU, Runsheng WANG, Xinrun WANG, Rundong ZHANG, Youzhi AN, Bo SHI, Zhen Yu LAI, Hanjiang |
author_sort |
YU, Runsheng |
title |
Learning expensive coordination: An event-based deep RL approach |
title_short |
Learning expensive coordination: An event-based deep RL approach |
title_full |
Learning expensive coordination: An event-based deep RL approach |
title_fullStr |
Learning expensive coordination: An event-based deep RL approach |
title_full_unstemmed |
Learning expensive coordination: An event-based deep RL approach |
title_sort |
learning expensive coordination: an event-based deep rl approach |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2020 |
url |
https://ink.library.smu.edu.sg/sis_research/9147 https://ink.library.smu.edu.sg/context/sis_research/article/10150/viewcontent/108_learning_expensive_coordination_av.pdf |
_version_ |
1814047755700535296 |