Deep reinforcement learning-based dynamic scheduling
Attempts to address the production scheduling problem thus far rely on simplifying assumptions, such as static environment and inflexible size of the problem, which compromises the schedule performance in practice due to many unpredictable disruptions to the system. Thus, the study of scheduling in...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/158353 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-158353 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Engineering::Industrial engineering |
spellingShingle |
Engineering::Computer science and engineering Engineering::Industrial engineering Liu, Renke Deep reinforcement learning-based dynamic scheduling |
description |
Attempts to address the production scheduling problem thus far rely on simplifying assumptions, such as static environment and inflexible size of the problem, which compromises the schedule performance in practice due to many unpredictable disruptions to the system. Thus, the study of scheduling in the presence of real-time events, termed dynamic scheduling, continues to attract attention given the agility, flexibility, and timeliness modern production systems must deliver.
Additionally, the changing nature of the manufacturing system also raises new challenges to existing scheduling strategies. At the front-end, the development of advanced data creation and exchange frameworks such as the Internet of things and cyber-physical system and their applications to the industrial environment have created an abundance of industrial data, while at the backend, edge and cloud computing technologies greatly enhance the capacity to process that data. Industrial data must be mined and analyzed so that the investment in infrastructure is not wasted, and the production system managed more effectively and in real-time.
Many data-driven technologies have been adopted in scheduling research, a promising candidate among them being reinforcement learning (RL) which is able to build a direct mapping from observation of environment to actions that improve its performance. In this thesis, a deep multi-agent reinforcement learning (deep MARL) architecture is proposed to solve the dynamic scheduling problem (DSP). The deep reinforcement learning (DRL) algorithm is used to train the decentralized scheduling agents, to capture the relationship between information on the factory floor and scheduling objectives, with the aim of making real-time decisions for a manufacturing system with frequent unexpected events.
Two major aspects of deep MARL application to DSP are addressed in this work, namely the conversion from traditional static scheduling problem (SSP) to dynamic scheduling in a practical context, and the adaptation of existing deep MARL algorithms to solve the scheduling problem in such an environment.
Some impractical constraints of traditional studies are removed to create a research context that is closer to actual practice, result in a scheduling problem of variable size and scope. Specialized state and action representations that can handle the ever-changing specification of problem are developed; the criteria of feature selection in dynamic environment are also discussed.
Recent progressions in DRL and MARL research are integrated into the proposed approach after selection and adaptation. In addition, various improvements to common deep MARL architecture are proposed, including the lightweight multilayer perceptron (MLP) encoder that is efficient in handling unstructured industrial data, a training scheme under the multi-agent architecture to improve the stability of training and overall performance, and knowledge-based reward-shaping techniques to decompose the joint reward signal into individual utilities to speed up the learning and encourage cooperative behavior between agents.
Simulation studies are then conducted for the ablation study and validation. In the first stage, the performance of the proposed approach, either as individual components or as an integrated model, are tested in iterative simulation runs within which a unique instance of production is created. Meanwhile, a set of DRL-based approaches from recent publications are run in parallel. Results suggest that the contribution of each improvement is significant; the integrated architecture also delivers stronger performance than peer DRL-based approaches.
For the validation, a set of priority rules that have strong performance in specified context and are widely applied in actual production scheduling are used as the benchmark. Proposed approach also provides performance gain compared to the strongest rule, with a minor increase in computation cost and negligible latency in decision-making. |
author2 |
Rajesh Piplani |
author_facet |
Rajesh Piplani Liu, Renke |
format |
Thesis-Doctor of Philosophy |
author |
Liu, Renke |
author_sort |
Liu, Renke |
title |
Deep reinforcement learning-based dynamic scheduling |
title_short |
Deep reinforcement learning-based dynamic scheduling |
title_full |
Deep reinforcement learning-based dynamic scheduling |
title_fullStr |
Deep reinforcement learning-based dynamic scheduling |
title_full_unstemmed |
Deep reinforcement learning-based dynamic scheduling |
title_sort |
deep reinforcement learning-based dynamic scheduling |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/158353 |
_version_ |
1761781236686127104 |
spelling |
sg-ntu-dr.10356-1583532023-03-11T18:10:41Z Deep reinforcement learning-based dynamic scheduling Liu, Renke Rajesh Piplani School of Mechanical and Aerospace Engineering Carlos Toro MRPiplani@ntu.edu.sg Engineering::Computer science and engineering Engineering::Industrial engineering Attempts to address the production scheduling problem thus far rely on simplifying assumptions, such as static environment and inflexible size of the problem, which compromises the schedule performance in practice due to many unpredictable disruptions to the system. Thus, the study of scheduling in the presence of real-time events, termed dynamic scheduling, continues to attract attention given the agility, flexibility, and timeliness modern production systems must deliver. Additionally, the changing nature of the manufacturing system also raises new challenges to existing scheduling strategies. At the front-end, the development of advanced data creation and exchange frameworks such as the Internet of things and cyber-physical system and their applications to the industrial environment have created an abundance of industrial data, while at the backend, edge and cloud computing technologies greatly enhance the capacity to process that data. Industrial data must be mined and analyzed so that the investment in infrastructure is not wasted, and the production system managed more effectively and in real-time. Many data-driven technologies have been adopted in scheduling research, a promising candidate among them being reinforcement learning (RL) which is able to build a direct mapping from observation of environment to actions that improve its performance. In this thesis, a deep multi-agent reinforcement learning (deep MARL) architecture is proposed to solve the dynamic scheduling problem (DSP). The deep reinforcement learning (DRL) algorithm is used to train the decentralized scheduling agents, to capture the relationship between information on the factory floor and scheduling objectives, with the aim of making real-time decisions for a manufacturing system with frequent unexpected events. Two major aspects of deep MARL application to DSP are addressed in this work, namely the conversion from traditional static scheduling problem (SSP) to dynamic scheduling in a practical context, and the adaptation of existing deep MARL algorithms to solve the scheduling problem in such an environment. Some impractical constraints of traditional studies are removed to create a research context that is closer to actual practice, result in a scheduling problem of variable size and scope. Specialized state and action representations that can handle the ever-changing specification of problem are developed; the criteria of feature selection in dynamic environment are also discussed. Recent progressions in DRL and MARL research are integrated into the proposed approach after selection and adaptation. In addition, various improvements to common deep MARL architecture are proposed, including the lightweight multilayer perceptron (MLP) encoder that is efficient in handling unstructured industrial data, a training scheme under the multi-agent architecture to improve the stability of training and overall performance, and knowledge-based reward-shaping techniques to decompose the joint reward signal into individual utilities to speed up the learning and encourage cooperative behavior between agents. Simulation studies are then conducted for the ablation study and validation. In the first stage, the performance of the proposed approach, either as individual components or as an integrated model, are tested in iterative simulation runs within which a unique instance of production is created. Meanwhile, a set of DRL-based approaches from recent publications are run in parallel. Results suggest that the contribution of each improvement is significant; the integrated architecture also delivers stronger performance than peer DRL-based approaches. For the validation, a set of priority rules that have strong performance in specified context and are widely applied in actual production scheduling are used as the benchmark. Proposed approach also provides performance gain compared to the strongest rule, with a minor increase in computation cost and negligible latency in decision-making. Doctor of Philosophy 2022-05-26T01:08:10Z 2022-05-26T01:08:10Z 2022 Thesis-Doctor of Philosophy Liu, R. (2022). Deep reinforcement learning-based dynamic scheduling. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/158353 https://hdl.handle.net/10356/158353 10.32657/10356/158353 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |