Deep reinforcement learning-based dynamic scheduling

Attempts to address the production scheduling problem thus far rely on simplifying assumptions, such as static environment and inflexible size of the problem, which compromises the schedule performance in practice due to many unpredictable disruptions to the system. Thus, the study of scheduling in...

Full description

Saved in:

Bibliographic Details
Main Author:	Liu, Renke
Other Authors:	Rajesh Piplani
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering Engineering::Industrial engineering
Online Access:	https://hdl.handle.net/10356/158353
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-158353
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Engineering::Industrial engineering
spellingShingle	Engineering::Computer science and engineering Engineering::Industrial engineering Liu, Renke Deep reinforcement learning-based dynamic scheduling
description	Attempts to address the production scheduling problem thus far rely on simplifying assumptions, such as static environment and inflexible size of the problem, which compromises the schedule performance in practice due to many unpredictable disruptions to the system. Thus, the study of scheduling in the presence of real-time events, termed dynamic scheduling, continues to attract attention given the agility, flexibility, and timeliness modern production systems must deliver. Additionally, the changing nature of the manufacturing system also raises new challenges to existing scheduling strategies. At the front-end, the development of advanced data creation and exchange frameworks such as the Internet of things and cyber-physical system and their applications to the industrial environment have created an abundance of industrial data, while at the backend, edge and cloud computing technologies greatly enhance the capacity to process that data. Industrial data must be mined and analyzed so that the investment in infrastructure is not wasted, and the production system managed more effectively and in real-time. Many data-driven technologies have been adopted in scheduling research, a promising candidate among them being reinforcement learning (RL) which is able to build a direct mapping from observation of environment to actions that improve its performance. In this thesis, a deep multi-agent reinforcement learning (deep MARL) architecture is proposed to solve the dynamic scheduling problem (DSP). The deep reinforcement learning (DRL) algorithm is used to train the decentralized scheduling agents, to capture the relationship between information on the factory floor and scheduling objectives, with the aim of making real-time decisions for a manufacturing system with frequent unexpected events. Two major aspects of deep MARL application to DSP are addressed in this work, namely the conversion from traditional static scheduling problem (SSP) to dynamic scheduling in a practical context, and the adaptation of existing deep MARL algorithms to solve the scheduling problem in such an environment. Some impractical constraints of traditional studies are removed to create a research context that is closer to actual practice, result in a scheduling problem of variable size and scope. Specialized state and action representations that can handle the ever-changing specification of problem are developed; the criteria of feature selection in dynamic environment are also discussed. Recent progressions in DRL and MARL research are integrated into the proposed approach after selection and adaptation. In addition, various improvements to common deep MARL architecture are proposed, including the lightweight multilayer perceptron (MLP) encoder that is efficient in handling unstructured industrial data, a training scheme under the multi-agent architecture to improve the stability of training and overall performance, and knowledge-based reward-shaping techniques to decompose the joint reward signal into individual utilities to speed up the learning and encourage cooperative behavior between agents. Simulation studies are then conducted for the ablation study and validation. In the first stage, the performance of the proposed approach, either as individual components or as an integrated model, are tested in iterative simulation runs within which a unique instance of production is created. Meanwhile, a set of DRL-based approaches from recent publications are run in parallel. Results suggest that the contribution of each improvement is significant; the integrated architecture also delivers stronger performance than peer DRL-based approaches. For the validation, a set of priority rules that have strong performance in specified context and are widely applied in actual production scheduling are used as the benchmark. Proposed approach also provides performance gain compared to the strongest rule, with a minor increase in computation cost and negligible latency in decision-making.
author2	Rajesh Piplani
author_facet	Rajesh Piplani Liu, Renke
format	Thesis-Doctor of Philosophy
author	Liu, Renke
author_sort	Liu, Renke
title	Deep reinforcement learning-based dynamic scheduling
title_short	Deep reinforcement learning-based dynamic scheduling
title_full	Deep reinforcement learning-based dynamic scheduling
title_fullStr	Deep reinforcement learning-based dynamic scheduling
title_full_unstemmed	Deep reinforcement learning-based dynamic scheduling
title_sort	deep reinforcement learning-based dynamic scheduling
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/158353
_version_	1761781236686127104
spelling	sg-ntu-dr.10356-1583532023-03-11T18:10:41Z Deep reinforcement learning-based dynamic scheduling Liu, Renke Rajesh Piplani School of Mechanical and Aerospace Engineering Carlos Toro MRPiplani@ntu.edu.sg Engineering::Computer science and engineering Engineering::Industrial engineering Attempts to address the production scheduling problem thus far rely on simplifying assumptions, such as static environment and inflexible size of the problem, which compromises the schedule performance in practice due to many unpredictable disruptions to the system. Thus, the study of scheduling in the presence of real-time events, termed dynamic scheduling, continues to attract attention given the agility, flexibility, and timeliness modern production systems must deliver. Additionally, the changing nature of the manufacturing system also raises new challenges to existing scheduling strategies. At the front-end, the development of advanced data creation and exchange frameworks such as the Internet of things and cyber-physical system and their applications to the industrial environment have created an abundance of industrial data, while at the backend, edge and cloud computing technologies greatly enhance the capacity to process that data. Industrial data must be mined and analyzed so that the investment in infrastructure is not wasted, and the production system managed more effectively and in real-time. Many data-driven technologies have been adopted in scheduling research, a promising candidate among them being reinforcement learning (RL) which is able to build a direct mapping from observation of environment to actions that improve its performance. In this thesis, a deep multi-agent reinforcement learning (deep MARL) architecture is proposed to solve the dynamic scheduling problem (DSP). The deep reinforcement learning (DRL) algorithm is used to train the decentralized scheduling agents, to capture the relationship between information on the factory floor and scheduling objectives, with the aim of making real-time decisions for a manufacturing system with frequent unexpected events. Two major aspects of deep MARL application to DSP are addressed in this work, namely the conversion from traditional static scheduling problem (SSP) to dynamic scheduling in a practical context, and the adaptation of existing deep MARL algorithms to solve the scheduling problem in such an environment. Some impractical constraints of traditional studies are removed to create a research context that is closer to actual practice, result in a scheduling problem of variable size and scope. Specialized state and action representations that can handle the ever-changing specification of problem are developed; the criteria of feature selection in dynamic environment are also discussed. Recent progressions in DRL and MARL research are integrated into the proposed approach after selection and adaptation. In addition, various improvements to common deep MARL architecture are proposed, including the lightweight multilayer perceptron (MLP) encoder that is efficient in handling unstructured industrial data, a training scheme under the multi-agent architecture to improve the stability of training and overall performance, and knowledge-based reward-shaping techniques to decompose the joint reward signal into individual utilities to speed up the learning and encourage cooperative behavior between agents. Simulation studies are then conducted for the ablation study and validation. In the first stage, the performance of the proposed approach, either as individual components or as an integrated model, are tested in iterative simulation runs within which a unique instance of production is created. Meanwhile, a set of DRL-based approaches from recent publications are run in parallel. Results suggest that the contribution of each improvement is significant; the integrated architecture also delivers stronger performance than peer DRL-based approaches. For the validation, a set of priority rules that have strong performance in specified context and are widely applied in actual production scheduling are used as the benchmark. Proposed approach also provides performance gain compared to the strongest rule, with a minor increase in computation cost and negligible latency in decision-making. Doctor of Philosophy 2022-05-26T01:08:10Z 2022-05-26T01:08:10Z 2022 Thesis-Doctor of Philosophy Liu, R. (2022). Deep reinforcement learning-based dynamic scheduling. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/158353 https://hdl.handle.net/10356/158353 10.32657/10356/158353 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Deep reinforcement learning-based dynamic scheduling

Similar Items