Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs

Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existin...

全面介紹

Saved in:

書目詳細資料
Main Authors:	NGUYEN, Duc Thien, YEOH, William, LAU, Hoong Chuin, Zilberstein, Shlomo
格式:	text
語言:	English
出版:	Institutional Knowledge at Singapore Management University 2014
主題:	Artificial Intelligence and Robotics Operations Research, Systems Engineering and Industrial Engineering
在線閱讀:	https://ink.library.smu.edu.sg/sis_research/2009 https://ink.library.smu.edu.sg/context/sis_research/article/3008/viewcontent/p1341_DecentralizedMulitAgentReinforcementLearningDCOP_2014_aamas.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Singapore Management University
語言:	English

實物特徵
總結:	Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs.

Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs

相似書籍