Empowering distributed constraint optimization with deep learning
Distributed Constraint Optimization Problems (DCOPs) are a fundamental formalism for multi-agent coordination, in which a set of autonomous agents cooperatively find assignments to optimize a global objective. Due to its ability of capturing the essentials of cooperative distributed problem solving,...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172960 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Distributed Constraint Optimization Problems (DCOPs) are a fundamental formalism for multi-agent coordination, in which a set of autonomous agents cooperatively find assignments to optimize a global objective. Due to its ability of capturing the essentials of cooperative distributed problem solving, DCOPs have been successfully applied to model the problems in many real-world domains like radio channel allocation and smart grids. However, solving DCOPs is notoriously difficult due to its NP-hardness, and many existing techniques either deliver a poor solution or incur prohibitive overheads when facing large-scale problem instances.
This doctoral thesis is devoted to improve the existing DCOP solving techniques in terms of both the scalability and the efficiency, by leveraging the power of heuristic search and deep learning to handle the high-dimensional solution space of DCOPs.
The first two parts of this thesis focus on improving the scalability of two important class of DCOP algorithms: Belief Propagation (BP) and context-based sampling algorithms. Specifically, in the first part, we investigate the defects of GDP, a state-of-the-art acceleration technique for BP, and point out the algorithm can fail if the local utility values are densely distributed. In light of this, we propose a class of techniques to offer excellent acceleration performance by providing better lower bound quality and search space organization, especially on the problems with dense local utility values. Extensive experiments on both synthetic and real-world problems demonstrate the great advantages of our algorithms over state-of-the-art. In the second part, we develop a memory-efficient context-based sampling algorithm for DCOPs by leveraging deep neural networks (DNNs) to approximate the high-dimensional tables during the solving process. In more detail, we first introduce a new context-based sampling algorithm built upon regret-matching, with a novel regret rounding scheme for incentivizing exploration. Because the regret values are associated with each context, which could be exponential in the induced width of the problem. Therefore, we propose to maintain a DNN, whose size is polynomial in the induced width, for each non-leaf agent to estimate the regret values, by employing online learning. We theoretically show the regret bounds and demonstrate the advantages of our algorithms on various setting, especially on the scenario where the memory is limited.
The last two parts of this thesis aim to enhance the performance of existing DCOP algorithms. In the third part, we present the first general-purpose deep model for DCOPs, called GAT-PCM, to generate efficient heuristics for a wide range of DCOP algorithms by learning to estimate the optimal cost of a target assignment given a partially-instantiated DCOP instance. The model is first pretrained with a large amount of optimally-labelled data offline, then implements fully-decentralized inference when generating heuristics for DCOP algorithms. We use local search and backtracking search for DCOPs as examples and show the great superiority of GAT-PCM-boosted algorithms through extensive empirical evaluations. In the final part, we investigate using DNNs to boost the performance of BP for solving COPs, by learning the optimal dynamic hyperparameters. Our model, DABP, seamlessly integrates BP and DNNs within the message-passing framework to reason about dynamic neighbor weights and damping factors for composing new BP messages. Furthermore, unlike existing neural-based BP variants, we propose a novel self-supervised learning algorithm for DABP with a smoothed cost, which does not require expensive training labels and also avoids the common out-of-distribution issue through efficient online learning. Extensive experiments show that our model significantly outperforms state-of-the-art baselines. |
---|