Neural combinatorial optimization: from routing to integer programming

Combinatorial optimization problems (COPs) are an important branch of mathematical optimization. It covers practical applications in many fields such as communication, transportation, manufacturing and has received massive attention in domains of computer science, operations research, economics, etc...

Full description

Saved in:
Bibliographic Details
Main Author: Wu, Yaoxin
Other Authors: Zhang Jie
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/163617
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Combinatorial optimization problems (COPs) are an important branch of mathematical optimization. It covers practical applications in many fields such as communication, transportation, manufacturing and has received massive attention in domains of computer science, operations research, economics, etc. However, most COPs are difficult to solve exactly owing to their NP-hardness. Heuristics thus are often leveraged as a popular class of optimization methods to solve COPs. However, classic heuristic methods severely depend on the design of hand-crafted rules for every specific COP, which could limit their performance in practice. Also, finding effective rules in heuristics is non-trivial and often needs massive expertise and tuning work. To overcome the above issues, the neural combinatorial optimization (NCO) domain comes into being, which refers to the research and applications of deep learning for solving COPs. Although there is already abundant literature in such domain, some challenging aspects are still not tackled well. For example, the current deep models are generally still inferior to highly-specialized solvers in some specific COPs, and meanwhile it is still difficult for NCO methods to solve general COPs, especially the ones with uncertainty and multiple objectives. In this doctoral thesis, we propose several NCO methods to automatically learn heuristics to solve 1) routing problems, which are a specific and important class of COPs, 2) integer programming, which is a general formulation to model various COPs in different domains, 3) stochastic integer programming, which can model various COPs with uncertainty, and 4) multi-objective integer programming, which is used to optimize multiple objectives for general COPs. All the proposed NCO methods obviate the need of considerable trial-and-error and domain knowledge. They also manifest themselves in the desirable generalization capacities across problem sizes or distributions. Firstly, we propose a deep reinforcement learning (DRL) framework to learn improvement heuristics for routing problems. We describe the solving process of improvement heuristics as a Markov decision process (MDP), and then design a self-attention based neural architecture for the policy in MDP so that it is used to select the next solution at each step of the local search. We deploy the method to solve two classic routing problems, i.e., travelling salesman problem (TSP) and capacitated vehicle routing problem (CVRP). Results show that the proposed method outperforms state-of-the-art learning based methods. The learned policies for improvement heuristics are more effective than conventional hand-crafted rules and they are able to well generalize to varying problem sizes and initial solutions. However, the above method resorts to simplified pairwise local operators, which limit the efficiency to improve the solution, and it is not applicable to general COPs. To tackle these issues, we then employ DRL to learn large neighborhood search (LNS) algorithm for solving integer programs (IPs), which can be used to model general COPs. The policy in MDP is trained as the destroy operator to pick a subset of variables at each LNS step, which is reoptimized by an IP solver as the repair operator. Since the combinatorial number of variable subsets hinders the direct application of RL algorithms, we propose action factorization to represent all variable subsets with binary decisions on each variable. The policy is parameterized via a neural network and trained by the actor-critic algorithm to select actions for each variable in parallel. Our method is evaluated on four IP benchmark problems. Results show that it significantly boosts the bounded-time performance of SCIP and Gurobi, and the learned policies generalize well to larger problems. Despite that the learned LNS can well solve general COPs by modeling them as IPs, some vital aspects in practical COPs are still not involved, e.g., the uncertainty and multiple objectives. Therefore, we propose two NCO methods to solve stochastic and multi-objective IPs. To solve two-stage stochastic integer programs (2SIPs), we use conditional variational autoencoder (CVAE) to learn scenario representations, where a graph convolutional network (GCN) based encoder embeds each scenario with the deterministic part (i.e. context) of a 2SIP instance into the latent space, from which a decoder reconstructs the scenario from its latent representation conditioned on the context. We apply the trained encoder to scenario reduction and objective prediction, which assist in the search of scenario representatives for attaining approximate solutions to original 2SIPs. Results show that our method achieves high-quality solutions in short runtime and the trained encoder generalizes well to larger problems, more scenarios and varying distributions. To solve multi-objective integer programs (MOIPs), we design a NCO method to refine objective-space decomposition algorithms (ODAs). Since ODAs often encounter difficulties in handling scalarized problems, which could cause infeasibility or repetitive nondominated points and thus induce redundant runtime, we present a graph neural network (GNN) based method to learn the reduction rule in the ODA. We formulate the algorithmic procedure of generic ODAs as a Markov decision process and parameterize the policy (reduction rule) with a novel two-stage GNN for state representation. We train our model with imitation learning and deploy it on a state-of-the-art ODA. Results show that our method significantly improves the solving efficiency of the ODA. The learned policy generalizes fairly well to larger problems or more objectives, and the proposed GNN outperforms existing ones for integer programming in terms of test and generalization accuracy.