Learning and exploiting shaped reward models for large scale multiagent RL

Many real world systems involve interaction among large number of agents to achieve a common goal, for example, air traffic control. Several model-free RL algorithms have been proposed for such settings. A key limitation is that the empirical reward signal in model-free case is not very effective in...

Full description

Saved in:

Bibliographic Details
Main Authors:	SINGH, Arambam James, KUMAR, Akshat, LAU, Hoong Chuin
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Model representation and learning domain models for planning Multi-agent planning And learning Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/6899 https://ink.library.smu.edu.sg/context/sis_research/article/7902/viewcontent/Learning_and_Exploiting_Shaped_Reward_Models_for_Large_Scale_Multiagent_RL.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7902
record_format	dspace
spelling	sg-smu-ink.sis_research-79022022-02-07T10:52:41Z Learning and exploiting shaped reward models for large scale multiagent RL SINGH, Arambam James KUMAR, Akshat LAU, Hoong Chuin Many real world systems involve interaction among large number of agents to achieve a common goal, for example, air traffic control. Several model-free RL algorithms have been proposed for such settings. A key limitation is that the empirical reward signal in model-free case is not very effective in addressing the multiagent credit assignment problem, which determines an agent's contribution to the team's success. This results in lower solution quality and high sample complexity. To address this, we contribute (a) an approach to learn a differentiable reward model for both continuous and discrete action setting by exploiting the collective nature of interactions among agents, a feature commonly present in large scale multiagent applications; (b) a shaped reward model analytically derived from the learned reward model to address the key challenge of credit assignment; (c) a model-based multiagent RL approach that integrates shaped rewards into well known RL algorithms such as policy gradient, soft-actor critic. Compared to previous methods, our learned reward models are more accurate, and our approaches achieve better solution quality on synthetic and real world instances of air traffic control, and cooperative navigation with large agent population. 2021-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6899 https://ink.library.smu.edu.sg/context/sis_research/article/7902/viewcontent/Learning_and_Exploiting_Shaped_Reward_Models_for_Large_Scale_Multiagent_RL.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Model representation and learning domain models for planning Multi-agent planning And learning Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Model representation and learning domain models for planning Multi-agent planning And learning Databases and Information Systems
spellingShingle	Model representation and learning domain models for planning Multi-agent planning And learning Databases and Information Systems SINGH, Arambam James KUMAR, Akshat LAU, Hoong Chuin Learning and exploiting shaped reward models for large scale multiagent RL
description	Many real world systems involve interaction among large number of agents to achieve a common goal, for example, air traffic control. Several model-free RL algorithms have been proposed for such settings. A key limitation is that the empirical reward signal in model-free case is not very effective in addressing the multiagent credit assignment problem, which determines an agent's contribution to the team's success. This results in lower solution quality and high sample complexity. To address this, we contribute (a) an approach to learn a differentiable reward model for both continuous and discrete action setting by exploiting the collective nature of interactions among agents, a feature commonly present in large scale multiagent applications; (b) a shaped reward model analytically derived from the learned reward model to address the key challenge of credit assignment; (c) a model-based multiagent RL approach that integrates shaped rewards into well known RL algorithms such as policy gradient, soft-actor critic. Compared to previous methods, our learned reward models are more accurate, and our approaches achieve better solution quality on synthetic and real world instances of air traffic control, and cooperative navigation with large agent population.
format	text
author	SINGH, Arambam James KUMAR, Akshat LAU, Hoong Chuin
author_facet	SINGH, Arambam James KUMAR, Akshat LAU, Hoong Chuin
author_sort	SINGH, Arambam James
title	Learning and exploiting shaped reward models for large scale multiagent RL
title_short	Learning and exploiting shaped reward models for large scale multiagent RL
title_full	Learning and exploiting shaped reward models for large scale multiagent RL
title_fullStr	Learning and exploiting shaped reward models for large scale multiagent RL
title_full_unstemmed	Learning and exploiting shaped reward models for large scale multiagent RL
title_sort	learning and exploiting shaped reward models for large scale multiagent rl
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/6899 https://ink.library.smu.edu.sg/context/sis_research/article/7902/viewcontent/Learning_and_Exploiting_Shaped_Reward_Models_for_Large_Scale_Multiagent_RL.pdf
_version_	1770576116090667008

Learning and exploiting shaped reward models for large scale multiagent RL

Similar Items