Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations

Making proper decision online in complex environment during the blast furnace (BF) operation is a key factor in achieving long-term success and profitability in the steel manufacturing industry. Regulatory lags, ore source uncertainty, and continuous decision requirement make it a challenging task....

Full description

Saved in:

Bibliographic Details
Main Authors:	Jiang, Ke, Jiang, Zhaohui, Jiang, Xudong, Xie, Yongfang, Gui, Weihua
Other Authors:	School of Electrical and Electronic Engineering
Format:	Article
Language:	English
Published:	2024
Subjects:	Engineering Blast furnace Offline reinforcement learning
Online Access:	https://hdl.handle.net/10356/177989
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	Making proper decision online in complex environment during the blast furnace (BF) operation is a key factor in achieving long-term success and profitability in the steel manufacturing industry. Regulatory lags, ore source uncertainty, and continuous decision requirement make it a challenging task. Recently, reinforcement learning (RL) has demonstrated state-of-the-art performance in various sequential decision-making problems. However, the strict safety requirements make it impossible to explore optimal decisions through online trial and error. Therefore, this article proposes a novel offline RL approach designed to ensure safety, maximize return, and address issues of partially observed states. Specifically, it utilizes an off-policy actor-critic framework to infer the optimal decision from expert operation trajectories. The "actor" in this framework is jointly trained by the supervision and evaluation signals to make decision with low risk and high return. Furthermore, we investigate a recurrent version of the actor and critic networks to better capture the complete observations, which solves the partially observed Markov decision process (POMDP) arising from sensor limitations. Verification within the BF smelting process demonstrates the improvements of the proposed algorithm in performance, i.e., safety and return.

Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations

Similar Items