Multi-agent dueling Q-learning with mean field and value decomposition

A great deal of multi agent reinforcement learning(MARL) work has investigated how multiple agents effectively accomplish cooperative tasks utilizing value function decomposition methods. However, existing value decomposition methods can only handle cooperative tasks with shared reward, due to these...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ding, Shifei, Du, Wei, Ding, Ling, Guo, Lili, Zhang, Jian, An, Bo
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering Value Decomposition Mixed Cooperative-Competitive Task
Online Access:	https://hdl.handle.net/10356/172040
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-172040
record_format	dspace
spelling	sg-ntu-dr.10356-1720402023-11-20T04:52:26Z Multi-agent dueling Q-learning with mean field and value decomposition Ding, Shifei Du, Wei Ding, Ling Guo, Lili Zhang, Jian An, Bo School of Computer Science and Engineering Engineering::Computer science and engineering Value Decomposition Mixed Cooperative-Competitive Task A great deal of multi agent reinforcement learning(MARL) work has investigated how multiple agents effectively accomplish cooperative tasks utilizing value function decomposition methods. However, existing value decomposition methods can only handle cooperative tasks with shared reward, due to these methods factorize the value function from a global perspective. To tackle the competitive tasks and mixed cooperative-competitive tasks with differing individual reward setting, we design the Multi-agent Dueling Q-learning (MDQ) method based on mean-filed theory and individual value decomposition. Specifically, we integrate the mean-field theory with the value decomposition to factorize the value function at the individual level, which can deal with mixed cooperative-competitive tasks. Besides, we take a dueling network architecture to distinguish which states are valuable, eliminating the need to learn the impact of each action on each state, therefore enabling efficient learning and leading to better policy evaluation. The proposed method MDQ is applicable not only to cooperative tasks with shared rewards setting, but also to mixed cooperative-competitive tasks with individualized reward settings. In this end, it is flexible and generically applicable enough to most multi-agent tasks. Empirical experiments on various mixed cooperative-competitive tasks demonstrate that MDQ significantly outperforms existing multi agent reinforcement learning methods. This work is supported by the National Natural Science Foundations of China (no. 61976216, no. 62276265 and no. 62206297). 2023-11-20T04:52:26Z 2023-11-20T04:52:26Z 2023 Journal Article Ding, S., Du, W., Ding, L., Guo, L., Zhang, J. & An, B. (2023). Multi-agent dueling Q-learning with mean field and value decomposition. Pattern Recognition, 139, 109436-. https://dx.doi.org/10.1016/j.patcog.2023.109436 0031-3203 https://hdl.handle.net/10356/172040 10.1016/j.patcog.2023.109436 2-s2.0-85149890947 139 109436 en Pattern Recognition © 2023 Elsevier Ltd. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Value Decomposition Mixed Cooperative-Competitive Task
spellingShingle	Engineering::Computer science and engineering Value Decomposition Mixed Cooperative-Competitive Task Ding, Shifei Du, Wei Ding, Ling Guo, Lili Zhang, Jian An, Bo Multi-agent dueling Q-learning with mean field and value decomposition
description	A great deal of multi agent reinforcement learning(MARL) work has investigated how multiple agents effectively accomplish cooperative tasks utilizing value function decomposition methods. However, existing value decomposition methods can only handle cooperative tasks with shared reward, due to these methods factorize the value function from a global perspective. To tackle the competitive tasks and mixed cooperative-competitive tasks with differing individual reward setting, we design the Multi-agent Dueling Q-learning (MDQ) method based on mean-filed theory and individual value decomposition. Specifically, we integrate the mean-field theory with the value decomposition to factorize the value function at the individual level, which can deal with mixed cooperative-competitive tasks. Besides, we take a dueling network architecture to distinguish which states are valuable, eliminating the need to learn the impact of each action on each state, therefore enabling efficient learning and leading to better policy evaluation. The proposed method MDQ is applicable not only to cooperative tasks with shared rewards setting, but also to mixed cooperative-competitive tasks with individualized reward settings. In this end, it is flexible and generically applicable enough to most multi-agent tasks. Empirical experiments on various mixed cooperative-competitive tasks demonstrate that MDQ significantly outperforms existing multi agent reinforcement learning methods.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Ding, Shifei Du, Wei Ding, Ling Guo, Lili Zhang, Jian An, Bo
format	Article
author	Ding, Shifei Du, Wei Ding, Ling Guo, Lili Zhang, Jian An, Bo
author_sort	Ding, Shifei
title	Multi-agent dueling Q-learning with mean field and value decomposition
title_short	Multi-agent dueling Q-learning with mean field and value decomposition
title_full	Multi-agent dueling Q-learning with mean field and value decomposition
title_fullStr	Multi-agent dueling Q-learning with mean field and value decomposition
title_full_unstemmed	Multi-agent dueling Q-learning with mean field and value decomposition
title_sort	multi-agent dueling q-learning with mean field and value decomposition
publishDate	2023
url	https://hdl.handle.net/10356/172040
_version_	1783955545492815872

Multi-agent dueling Q-learning with mean field and value decomposition

Similar Items