Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs

Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existin...

Full description

Saved in:

Bibliographic Details
Main Authors:	NGUYEN, Duc Thien, YEOH, William, LAU, Hoong Chuin, Zilberstein, Shlomo
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2014
Subjects:	Artificial Intelligence and Robotics Operations Research, Systems Engineering and Industrial Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/2009 https://ink.library.smu.edu.sg/context/sis_research/article/3008/viewcontent/p1341_DecentralizedMulitAgentReinforcementLearningDCOP_2014_aamas.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-3008
record_format	dspace
spelling	sg-smu-ink.sis_research-30082016-12-15T07:31:26Z Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs NGUYEN, Duc Thien YEOH, William LAU, Hoong Chuin Zilberstein, Shlomo Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. 2014-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2009 https://ink.library.smu.edu.sg/context/sis_research/article/3008/viewcontent/p1341_DecentralizedMulitAgentReinforcementLearningDCOP_2014_aamas.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics Operations Research, Systems Engineering and Industrial Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Artificial Intelligence and Robotics Operations Research, Systems Engineering and Industrial Engineering
spellingShingle	Artificial Intelligence and Robotics Operations Research, Systems Engineering and Industrial Engineering NGUYEN, Duc Thien YEOH, William LAU, Hoong Chuin Zilberstein, Shlomo Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs
description	Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs.
format	text
author	NGUYEN, Duc Thien YEOH, William LAU, Hoong Chuin Zilberstein, Shlomo
author_facet	NGUYEN, Duc Thien YEOH, William LAU, Hoong Chuin Zilberstein, Shlomo
author_sort	NGUYEN, Duc Thien
title	Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs
title_short	Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs
title_full	Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs
title_fullStr	Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs
title_full_unstemmed	Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs
title_sort	decentralized multi-agent reinforcement learning in average-reward dynamic dcops
publisher	Institutional Knowledge at Singapore Management University
publishDate	2014
url	https://ink.library.smu.edu.sg/sis_research/2009 https://ink.library.smu.edu.sg/context/sis_research/article/3008/viewcontent/p1341_DecentralizedMulitAgentReinforcementLearningDCOP_2014_aamas.pdf
_version_	1770571772630925312

Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs

Similar Items