Exploring tabular Q-learning for single machine job dispatching

Reinforcement Learning (RL) studies the problem of how an autonomous agent can learn, while interacting with its environment, to choose the appropriate actions for achieving its goals. In this report, we will be exploring the application of tabular Q-Learning, a type of RL, to single machine job dis...

Full description

Saved in:
Bibliographic Details
Main Author: Chai, Jeffrey Zhi Yang
Other Authors: Sivakumar A. I.
Format: Final Year Project
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/77872
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Reinforcement Learning (RL) studies the problem of how an autonomous agent can learn, while interacting with its environment, to choose the appropriate actions for achieving its goals. In this report, we will be exploring the application of tabular Q-Learning, a type of RL, to single machine job dispatching (SMDJ) problems. In tabular Q-Learning, the agent learns the most appropriate action for any environment state by storing this information in the form of state-action pair value functions which are represented in a discretised state-action table. Although there have been existing studies that demonstrated its application potential in these scheduling problems, the current Q-Learning implementation had one limitation identified in this report, which was using a state-action tables that was pre-determined by trial and error experiments before the actual learning application to the scheduling problems. This can be a very tedious process. For this project, we therefore proposed a clustering method, K-Means Clustering, to automate and dynamically determine the state-action tables instead. These state-policy tables, under our proposed method, are specific to SMDJ problems of different system objectives and was also able to achieve higher rate of learning convergence by the agent. The project also investigated the feasibility of using Q-Learning to formulate composite dispatching rules for SMJD problems with multiple objectives, an area of research not yet pervasively studied. Overall, this study provided encouraging results for better future application of Q-Learning to more complex production scheduling.