Distributed deep reinforcement learning-based spectrum and power allocation for heterogeneous networks

This paper investigates the problem of distributed resource management in two-tier heterogeneous networks, where each cell selects its joint device association, spectrum allocation, and power allocation strategy based only on locally-observed information without any central controller. As the optimi...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang, Helin, Zhao, Jun, Lam, Kwok-Yan, Xiong, Zehui, Wu, Qingqing, Xiao, Liang
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/166422
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This paper investigates the problem of distributed resource management in two-tier heterogeneous networks, where each cell selects its joint device association, spectrum allocation, and power allocation strategy based only on locally-observed information without any central controller. As the optimization problem with devices' quality-of-service (QoS) constraints is non-convex and NP-hard, we model it as a Markov decision process (MDP). Considering the fact that the network is highly complex with large state and action spaces, a multi-agent dueling deep-Q network-based algorithm combined with distributed coordinated learning is proposed to effectively learn the optimized intelligent resource management policy, where the algorithm adopts dueling deep network to learn the action-value distribution by estimating both the state-value and action advantage functions. Under the distributed coordinated learning manner and dueling architecture, the learning algorithm can rapidly converge to the optimized policy. Simulation results demonstrate that the proposed distributed coordinated learning algorithm outperforms other existing learning algorithms in terms of learning efficiency, network data rate, and QoS satisfaction probability.