Reinforcement learning approach to coordinate real-world multi-agent dynamic routing and scheduling

In this thesis, we study new variants of routing and scheduling problems motivated by real-world problems from the urban logistics and law enforcement domains. In particular, we focus on two key aspects: dynamic and multi-agent. While routing problems such as the Vehicle Routing Problem (VRP) is wel...

Full description

Saved in:
Bibliographic Details
Main Author: JOE WALDY
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/452
https://ink.library.smu.edu.sg/context/etd_coll/article/1450/viewcontent/GPIS_AY2018_PhD_Joe_Waldy.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:In this thesis, we study new variants of routing and scheduling problems motivated by real-world problems from the urban logistics and law enforcement domains. In particular, we focus on two key aspects: dynamic and multi-agent. While routing problems such as the Vehicle Routing Problem (VRP) is well-studied in the Operations Research (OR) community, we know that in real-world route planning today, initially-planned route plans and schedules may be disrupted by dynamically-occurring events. In addition, routing and scheduling plans cannot be done in silos due to the presence of other agents which may be independent and self-interested. These requirements create great opportunities for synergized efforts between OR and Artificial Intelligence (AI), particularly Reinforcement Learning (RL) and Distributed AI such as Multi-Agent Systems (MAS). The fundamental research question for dynamic routing and scheduling is: How to make optimal decision within a short period of time?. Routing and scheduling decisions are complex because they are multi-dimensional; consisting of spatial and temporal components. Meanwhile, each occurrence of dynamic event requires an event-handling action and a re-planning action such as assignment/dispatch and rerouting/rescheduling actions respectively. Current approaches are either time consuming or not sample efficient. Meanwhile, to address complex action, most current RL approaches either decompose the problem or action into multiple stages. In this thesis, we propose an RL-based approach that combines Value Function Approximation (VFA) and routing/scheduling heuristic to learn event-handling and re-planning policies jointly without the need for any decomposition step. We show that our approach is faster than sampling-based approaches and more sample efficient than current offline methods for moderately-sized problem instances. We also show experimentally that our joint learning approach outperforms the commonly-used two-stage approach especially when the problem scenario becomes more complex. Multi-agent routing and scheduling problems in real-world context go beyond the classical definition of multi-agent in academic literature which usually takes the form of multiple vehicles or multiple machines. In this thesis, we refer an agent as a independent, higher-order decision-making entity that is capable of executing complex action and usually consists of multiple sub-agents such as Logistics Service Providers (LSPs) where each LSP consists of multiple vehicles. Existing works on multi-agent VRP assume collaboration amongst agents. However, in real-world context, agents may not necessarily be collaborative. One such instance is a problem of coordinating Business-to-Business (B2B) pickup-delivery operations involving multiple LSPs. To address this gap, we formulate such problem as a strategic game and propose a scalable, decentralized, coordinated planning approach based on iterative best response to coordinate multi-agent routing and scheduling. Our proposed approach is able to ensure that there are enough incentives for agents to adopt a coordinated plan rather than planning independently. Our approach is also scalable as it decomposes a multi-agent problem into multiple single-agent problems allowing existing single-agent planning algorithms to be applied to a smaller problem. Most current Multi-Agent RL (MARL) approaches solve dynamic routing and scheduling problems in which an agent is still defined as low-order entity such as vehicle or machine. Moreover, there is no prior work that addresses dynamic routing and scheduling problems where there exist multiple independent higher-order agents capable of making complex decision directly without decomposing the problem or the action. Therefore, in the final contribution of this thesis, we present a pioneering effort on a cooperative MARL approach to solve multi-agent dynamic routing and scheduling problem directly without any decomposition step. This contribution extends our earlier proposed VFA method to address multi-agent setting and incorporates an iterative best response procedure as a decentralized optimization heuristic and an explicit coordination mechanism. We evaluate our approach against a realistic multi-agent dynamic police patrol problem and through a series of ablation studies, ascertain the effectiveness of our proposed learning and coordination mechanisms. This thesis opens up many opportunities for future research, some of which are presented in the concluding chapter, specifically those that represent aspects of RL approach that are peculiar to real-world problem settings.