Effects of action masking on deep reinforcement learning for inventory management

Inventory Management has always been a crucial part of Supply Chain Management, and not managing it carefully would lead to unnecessary inventory costs such as lost sales and holding cost. Over the years, many researchers have investigated solutions and systems in the field of operations research to...

Full description

Saved in:
Bibliographic Details
Main Author: Goh, Bryan Zheng Ting
Other Authors: Lee Bu Sung, Francis
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/166091
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Inventory Management has always been a crucial part of Supply Chain Management, and not managing it carefully would lead to unnecessary inventory costs such as lost sales and holding cost. Over the years, many researchers have investigated solutions and systems in the field of operations research to better manage inventory and optimize it by lowering the inventory cost as much as possible. Due to recent advancement in reinforcement learning and the advancement of deep neural network, there has been rising interest in making use of Deep Reinforcement Learning to train an artificial agent that would be able to manage inventory and minimize inventory costs. Through this report, a solution for a single retailer, single item Inventory Management Environment with stochastic demand would be developed using Deep Q-Network (DQN). Moreover, even though there are recent works of using DQN in Inventory Management, not many have investigated the effects of action masking on this problem domain. Thus, this report will attempt to focus on investigating different methods of action masking and analyze their effects on the speed of convergence during the training phase and additional metric such as mean reward, fill rate and service level during the inference phase. Furthermore, this report will also analyze the effects of different demand distribution and whether that will affect the training of a DQN agent.