Exploring tabular Q-learning for single machine job dispatching
Reinforcement Learning (RL) studies the problem of how an autonomous agent can learn, while interacting with its environment, to choose the appropriate actions for achieving its goals. In this report, we will be exploring the application of tabular Q-Learning, a type of RL, to single machine job dis...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/77872 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Reinforcement Learning (RL) studies the problem of how an autonomous agent can learn, while interacting with its environment, to choose the appropriate actions for achieving its goals. In this report, we will be exploring the application of tabular Q-Learning, a type of RL, to single machine job dispatching (SMDJ) problems. In tabular Q-Learning, the agent learns the most appropriate action for any environment state by storing this information in the form of state-action pair value functions which are represented in a discretised state-action table. Although there have been existing studies that demonstrated its application potential in these scheduling problems, the current Q-Learning implementation had one limitation identified in this report, which was using a state-action tables that was pre-determined by trial and error experiments before the actual learning application to the scheduling problems. This can be a very tedious process. For this project, we therefore proposed a clustering method, K-Means Clustering, to automate and dynamically determine the state-action tables instead. These state-policy tables, under our proposed method, are specific to SMDJ problems of different system objectives and was also able to achieve higher rate of learning convergence by the agent. The project also investigated the feasibility of using Q-Learning to formulate composite dispatching rules for SMJD problems with multiple objectives, an area of research not yet pervasively studied. Overall, this study provided encouraging results for better future application of Q-Learning to more complex production scheduling. |
---|