Application of reinforcement learning to production system
The primary goal for this research is to obtain the optimal or near-optimal joint production and maintenance scheduling policy by means of reinforcement learning. In this research, we adopted reinforcement algorithm to control the feeding interval and the maintenance state of upstream station in pro...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/75935 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The primary goal for this research is to obtain the optimal or near-optimal joint production and maintenance scheduling policy by means of reinforcement learning. In this research, we adopted reinforcement algorithm to control the feeding interval and the maintenance state of upstream station in production system. With the help of this algorithm, the work-in-process(WIP) in the production system can be limited to a reasonable level and machines are preventively maintained to be functional.
By balancing the reward and cost from WIP, maintenance and the idle loss of bottleneck machine, the reinforcement learning algorithm is able to find the acceptable policy for adjusting the feeding rate and scheduling the preventive maintenance for upstream machine. However reinforcement learning involves in a lot of parameters and in practice parameters may range widely from cases to cases. There are totally five experiments performed in this research, the first and the second is the validation experiments and the third and forth is to explain the property of the algorithm. the fifth experiment describes how fast the algorithm can learn to achieve the target state of upstream station. The developed model consists of reinforcement learning based, decision-making agents with simulation model of the integrated production system. The smart agent determine the optimal or near-optimal action for each system state by interacting with their environment. |
---|