Machine learning frameworks for urban logistics optimisation problems
Vehicle Routing Problem(VRP), a challenging topic in Urban Logistics Optimization, is a combinatorial optimization problem with many exact and heuristic algorithm. VRP has many variants, for example, VRPTW describes a classic VRP with time window constraint. In this project, an end-to-end reinforcem...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/139856 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-139856 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1398562023-07-07T18:42:07Z Machine learning frameworks for urban logistics optimisation problems Zhang, Xincai XIAO Gaoxi School of Electrical and Electronic Engineering Institute of High Performance Computing (IHPC) A*Star EGXXiao@ntu.edu.sg Engineering::Electrical and electronic engineering Vehicle Routing Problem(VRP), a challenging topic in Urban Logistics Optimization, is a combinatorial optimization problem with many exact and heuristic algorithm. VRP has many variants, for example, VRPTW describes a classic VRP with time window constraint. In this project, an end-to-end reinforcement learning(RL) framework is proposed to solve Vehicle Routing Problem with Time Window(VRPTW), which is an attempt to improve an existing RL framework for VRP. Applying Proximal Policy Optimization(PPO) and Random Network Distillation(RND), we attempt to improve the performance of proposed model. By observing the reward signals, a single policy model is trained to figure out the near-optimal solutions; PPO is applied to improve the policy gradient algorithm and optimize the parameters of a parameterized stochastic policy; RND introduces an exploration bonus into RL model to improve on hard exploration task. Instead of retraining for every instance, the proposed approach is able to generate a solution right away for any VRPTW instance with the same customer node(location, time window), vehicle capacity, and demand distributions as those for training. And a reasonable improvement on the performance can be seen due to the application of PPO and RND. Furthermore, the proposed framework has the potential to be improved for more complicated Urban Logistics Optimization Problems. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-22T05:25:48Z 2020-05-22T05:25:48Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/139856 en B3278-191 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Electrical and electronic engineering Zhang, Xincai Machine learning frameworks for urban logistics optimisation problems |
description |
Vehicle Routing Problem(VRP), a challenging topic in Urban Logistics Optimization, is a combinatorial optimization problem with many exact and heuristic algorithm. VRP has many variants, for example, VRPTW describes a classic VRP with time window constraint. In this project, an end-to-end reinforcement learning(RL) framework is proposed to solve Vehicle Routing Problem with Time Window(VRPTW), which is an attempt to improve an existing RL framework for VRP. Applying Proximal Policy Optimization(PPO) and Random Network Distillation(RND), we attempt to improve the performance of proposed model. By observing the reward signals, a single policy model is trained to figure out the near-optimal solutions; PPO is applied to improve the policy gradient algorithm and optimize the parameters of a parameterized stochastic policy; RND introduces an exploration bonus into RL model to improve on hard exploration task. Instead of retraining for every instance, the proposed approach is able to generate a solution right away for any VRPTW instance with the same customer node(location, time window), vehicle capacity, and demand distributions as those for training. And a reasonable improvement on the performance can be seen due to the application of PPO and RND. Furthermore, the proposed framework has the potential to be improved for more complicated Urban Logistics Optimization Problems. |
author2 |
XIAO Gaoxi |
author_facet |
XIAO Gaoxi Zhang, Xincai |
format |
Final Year Project |
author |
Zhang, Xincai |
author_sort |
Zhang, Xincai |
title |
Machine learning frameworks for urban logistics optimisation problems |
title_short |
Machine learning frameworks for urban logistics optimisation problems |
title_full |
Machine learning frameworks for urban logistics optimisation problems |
title_fullStr |
Machine learning frameworks for urban logistics optimisation problems |
title_full_unstemmed |
Machine learning frameworks for urban logistics optimisation problems |
title_sort |
machine learning frameworks for urban logistics optimisation problems |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/139856 |
_version_ |
1772825298691162112 |