Learning generalizable heuristics for solving vehicle routing problem under distribution shift
Combinatorial optimization problems (COPs) with NP-hardness are always featured by discrete search space and intractable computation to seek the optimal solution. As a fundamental COP in logistics, the vehicle routing problem (VRP) concerns the cost-optimal delivery of items from the depot to a set...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/173657 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Combinatorial optimization problems (COPs) with NP-hardness are always featured by discrete search space and intractable computation to seek the optimal solution. As a fundamental COP in logistics, the vehicle routing problem (VRP) concerns the cost-optimal delivery of items from the depot to a set of geographically scattered customers through vehicle(s). It has been extensively investigated for decades and found widespread applications in reality, such as waste collection, dial-a-ride and courier delivery. Studies on VRP with deep (reinforcement) learning have been emerging in recent years. Different from the conventional methods, this line of work aims at automatically searching heuristic policies by using neural networks to learn the underlying patterns in instances, which could be used to discover better policies than hand-crafted ones. Towards reducing the gaps to the highly optimized conventional heuristic solvers, a large number of efforts have been made to invent various deep models to solve the VRP variants, i.e., traveling salesman problem (TSP) and capacitated vehicle routing problem (CVRP). Although much success has been achieved, existing deep models always assume a pure spatial distribution of nodes (customers) for training, i.e., uniform distribution, which severely limits their applications given the impaired cross-distribution generalization ability. While it is natural to stipulate that a majority of typical instances follow an identical distribution, a minority of atypical (yet important) instances which follow different one(s) may always exist in reality. In this sense, the mono-training paradigm in existing deep models may cause inferior performance or even failures for those atypical instances. E.g., a trained routing policy may offer unreasonable routes to serve a group of (important) customers whose location pattern differs from the ones used in training. Intuitively, an alternative is to force the model to homogeneously treat all the instances of different distributions for training. However, it does not necessarily enhance the cross-distribution generalization performance in our view, as it may suffer high losses in the group of the atypical instances.
To address this issue, in the first work, we propose an approach by exploiting group distributionally robust optimization (group DRO) to jointly train the deep models on instances of different groups (more than one group), where each group follows a distribution. In particular, we aim at minimizing the loss for the worst-case group during training, where we optimize the weights for different groups of instances and the parameters for the deep model in an interleaved manner. More importantly, we do not need to label the distribution for each group during inference. In addition, we also leverage convolutional neural network (CNN) to learn initial representations of VRP instances so that the distribution-aware features in spatial patterns could be favorably captured to boost the performance further.
Secondly, motivated by the facts that, 1) a VRP instance can always be represented as a graph, 2) the VRP solution depends on the pattern of the graph (e.g. the distribution of nodes), we postulate that transferable structural patterns across diverse graphs could be helpful to improve the generalization against different distributions. Especially, similar local patterns may distribute across graphs even if those graphs belong to different distributions. On the other hand, recent advances in computer vision (CV) and natural language processing (NLP) have testified that pre-training an encoder network in a contrastive learning manner can produce more informative and transferable representations for downstream tasks. With the above principle, in my second work, we propose a multi-view graph contrastive learning (MVGCL) approach to foster the generalization capability of neural heuristics for VRPs, by mining the underlying patterns across graphs.
In the third work, we propose an ensemble-based deep reinforcement learning method for VRPs, which learns a group of diverse sub-policies to cope with various instance distributions. In particular, to prevent convergence of the parameters to the same one, we enforce diversity across sub-policies by leveraging Bootstrap with random initialization. Moreover, we also explicitly pursue inequality between sub-policies by exploiting regularization terms during training to further enhance diversity. Experimental results show that our methods are able to outperform the state-of-the-art neural baselines on randomly generated instances of various distributions, and also generalizes favourably on the benchmark instances from TSPLib and CVRPLib, which confirmed the effectiveness of the whole method and the respective designs. |
---|