Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations

Making proper decision online in complex environment during the blast furnace (BF) operation is a key factor in achieving long-term success and profitability in the steel manufacturing industry. Regulatory lags, ore source uncertainty, and continuous decision requirement make it a challenging task....

Full description

Saved in:
Bibliographic Details
Main Authors: Jiang, Ke, Jiang, Zhaohui, Jiang, Xudong, Xie, Yongfang, Gui, Weihua
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/177989
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-177989
record_format dspace
spelling sg-ntu-dr.10356-1779892024-06-04T00:57:57Z Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations Jiang, Ke Jiang, Zhaohui Jiang, Xudong Xie, Yongfang Gui, Weihua School of Electrical and Electronic Engineering Engineering Blast furnace Offline reinforcement learning Making proper decision online in complex environment during the blast furnace (BF) operation is a key factor in achieving long-term success and profitability in the steel manufacturing industry. Regulatory lags, ore source uncertainty, and continuous decision requirement make it a challenging task. Recently, reinforcement learning (RL) has demonstrated state-of-the-art performance in various sequential decision-making problems. However, the strict safety requirements make it impossible to explore optimal decisions through online trial and error. Therefore, this article proposes a novel offline RL approach designed to ensure safety, maximize return, and address issues of partially observed states. Specifically, it utilizes an off-policy actor-critic framework to infer the optimal decision from expert operation trajectories. The "actor" in this framework is jointly trained by the supervision and evaluation signals to make decision with low risk and high return. Furthermore, we investigate a recurrent version of the actor and critic networks to better capture the complete observations, which solves the partially observed Markov decision process (POMDP) arising from sensor limitations. Verification within the BF smelting process demonstrates the improvements of the proposed algorithm in performance, i.e., safety and return. This work was supported in part by the National Major Scientific Research Equipment of China under Grant 61927803; in part by the Science and Technology Innovation Program of Hunan Province under Grant 2021RC4054, in part by the Key-Area Research and Development Program of Guangdong Province under Grant 2021B0101200005, and in part by the China Scholarship Council under Grant 202106370153. 2024-06-04T00:57:57Z 2024-06-04T00:57:57Z 2024 Journal Article Jiang, K., Jiang, Z., Jiang, X., Xie, Y. & Gui, W. (2024). Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations. IEEE Transactions On Neural Networks and Learning Systems, 35(3), 3077-3090. https://dx.doi.org/10.1109/TNNLS.2023.3340741 2162-237X https://hdl.handle.net/10356/177989 10.1109/TNNLS.2023.3340741 38231813 2-s2.0-85182952142 3 35 3077 3090 en IEEE Transactions on Neural Networks and Learning Systems © 2024 IEEE. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering
Blast furnace
Offline reinforcement learning
spellingShingle Engineering
Blast furnace
Offline reinforcement learning
Jiang, Ke
Jiang, Zhaohui
Jiang, Xudong
Xie, Yongfang
Gui, Weihua
Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations
description Making proper decision online in complex environment during the blast furnace (BF) operation is a key factor in achieving long-term success and profitability in the steel manufacturing industry. Regulatory lags, ore source uncertainty, and continuous decision requirement make it a challenging task. Recently, reinforcement learning (RL) has demonstrated state-of-the-art performance in various sequential decision-making problems. However, the strict safety requirements make it impossible to explore optimal decisions through online trial and error. Therefore, this article proposes a novel offline RL approach designed to ensure safety, maximize return, and address issues of partially observed states. Specifically, it utilizes an off-policy actor-critic framework to infer the optimal decision from expert operation trajectories. The "actor" in this framework is jointly trained by the supervision and evaluation signals to make decision with low risk and high return. Furthermore, we investigate a recurrent version of the actor and critic networks to better capture the complete observations, which solves the partially observed Markov decision process (POMDP) arising from sensor limitations. Verification within the BF smelting process demonstrates the improvements of the proposed algorithm in performance, i.e., safety and return.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Jiang, Ke
Jiang, Zhaohui
Jiang, Xudong
Xie, Yongfang
Gui, Weihua
format Article
author Jiang, Ke
Jiang, Zhaohui
Jiang, Xudong
Xie, Yongfang
Gui, Weihua
author_sort Jiang, Ke
title Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations
title_short Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations
title_full Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations
title_fullStr Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations
title_full_unstemmed Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations
title_sort reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations
publishDate 2024
url https://hdl.handle.net/10356/177989
_version_ 1806059906075197440