Deep reinforcement learning-based dynamic scheduling

Attempts to address the production scheduling problem thus far rely on simplifying assumptions, such as static environment and inflexible size of the problem, which compromises the schedule performance in practice due to many unpredictable disruptions to the system. Thus, the study of scheduling in...

Full description

Saved in:
Bibliographic Details
Main Author: Liu, Renke
Other Authors: Rajesh Piplani
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/158353
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Attempts to address the production scheduling problem thus far rely on simplifying assumptions, such as static environment and inflexible size of the problem, which compromises the schedule performance in practice due to many unpredictable disruptions to the system. Thus, the study of scheduling in the presence of real-time events, termed dynamic scheduling, continues to attract attention given the agility, flexibility, and timeliness modern production systems must deliver. Additionally, the changing nature of the manufacturing system also raises new challenges to existing scheduling strategies. At the front-end, the development of advanced data creation and exchange frameworks such as the Internet of things and cyber-physical system and their applications to the industrial environment have created an abundance of industrial data, while at the backend, edge and cloud computing technologies greatly enhance the capacity to process that data. Industrial data must be mined and analyzed so that the investment in infrastructure is not wasted, and the production system managed more effectively and in real-time. Many data-driven technologies have been adopted in scheduling research, a promising candidate among them being reinforcement learning (RL) which is able to build a direct mapping from observation of environment to actions that improve its performance. In this thesis, a deep multi-agent reinforcement learning (deep MARL) architecture is proposed to solve the dynamic scheduling problem (DSP). The deep reinforcement learning (DRL) algorithm is used to train the decentralized scheduling agents, to capture the relationship between information on the factory floor and scheduling objectives, with the aim of making real-time decisions for a manufacturing system with frequent unexpected events. Two major aspects of deep MARL application to DSP are addressed in this work, namely the conversion from traditional static scheduling problem (SSP) to dynamic scheduling in a practical context, and the adaptation of existing deep MARL algorithms to solve the scheduling problem in such an environment. Some impractical constraints of traditional studies are removed to create a research context that is closer to actual practice, result in a scheduling problem of variable size and scope. Specialized state and action representations that can handle the ever-changing specification of problem are developed; the criteria of feature selection in dynamic environment are also discussed. Recent progressions in DRL and MARL research are integrated into the proposed approach after selection and adaptation. In addition, various improvements to common deep MARL architecture are proposed, including the lightweight multilayer perceptron (MLP) encoder that is efficient in handling unstructured industrial data, a training scheme under the multi-agent architecture to improve the stability of training and overall performance, and knowledge-based reward-shaping techniques to decompose the joint reward signal into individual utilities to speed up the learning and encourage cooperative behavior between agents. Simulation studies are then conducted for the ablation study and validation. In the first stage, the performance of the proposed approach, either as individual components or as an integrated model, are tested in iterative simulation runs within which a unique instance of production is created. Meanwhile, a set of DRL-based approaches from recent publications are run in parallel. Results suggest that the contribution of each improvement is significant; the integrated architecture also delivers stronger performance than peer DRL-based approaches. For the validation, a set of priority rules that have strong performance in specified context and are widely applied in actual production scheduling are used as the benchmark. Proposed approach also provides performance gain compared to the strongest rule, with a minor increase in computation cost and negligible latency in decision-making.