Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations
Making proper decision online in complex environment during the blast furnace (BF) operation is a key factor in achieving long-term success and profitability in the steel manufacturing industry. Regulatory lags, ore source uncertainty, and continuous decision requirement make it a challenging task....
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/177989 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-177989 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1779892024-06-04T00:57:57Z Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations Jiang, Ke Jiang, Zhaohui Jiang, Xudong Xie, Yongfang Gui, Weihua School of Electrical and Electronic Engineering Engineering Blast furnace Offline reinforcement learning Making proper decision online in complex environment during the blast furnace (BF) operation is a key factor in achieving long-term success and profitability in the steel manufacturing industry. Regulatory lags, ore source uncertainty, and continuous decision requirement make it a challenging task. Recently, reinforcement learning (RL) has demonstrated state-of-the-art performance in various sequential decision-making problems. However, the strict safety requirements make it impossible to explore optimal decisions through online trial and error. Therefore, this article proposes a novel offline RL approach designed to ensure safety, maximize return, and address issues of partially observed states. Specifically, it utilizes an off-policy actor-critic framework to infer the optimal decision from expert operation trajectories. The "actor" in this framework is jointly trained by the supervision and evaluation signals to make decision with low risk and high return. Furthermore, we investigate a recurrent version of the actor and critic networks to better capture the complete observations, which solves the partially observed Markov decision process (POMDP) arising from sensor limitations. Verification within the BF smelting process demonstrates the improvements of the proposed algorithm in performance, i.e., safety and return. This work was supported in part by the National Major Scientific Research Equipment of China under Grant 61927803; in part by the Science and Technology Innovation Program of Hunan Province under Grant 2021RC4054, in part by the Key-Area Research and Development Program of Guangdong Province under Grant 2021B0101200005, and in part by the China Scholarship Council under Grant 202106370153. 2024-06-04T00:57:57Z 2024-06-04T00:57:57Z 2024 Journal Article Jiang, K., Jiang, Z., Jiang, X., Xie, Y. & Gui, W. (2024). Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations. IEEE Transactions On Neural Networks and Learning Systems, 35(3), 3077-3090. https://dx.doi.org/10.1109/TNNLS.2023.3340741 2162-237X https://hdl.handle.net/10356/177989 10.1109/TNNLS.2023.3340741 38231813 2-s2.0-85182952142 3 35 3077 3090 en IEEE Transactions on Neural Networks and Learning Systems © 2024 IEEE. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering Blast furnace Offline reinforcement learning |
spellingShingle |
Engineering Blast furnace Offline reinforcement learning Jiang, Ke Jiang, Zhaohui Jiang, Xudong Xie, Yongfang Gui, Weihua Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations |
description |
Making proper decision online in complex environment during the blast furnace (BF) operation is a key factor in achieving long-term success and profitability in the steel manufacturing industry. Regulatory lags, ore source uncertainty, and continuous decision requirement make it a challenging task. Recently, reinforcement learning (RL) has demonstrated state-of-the-art performance in various sequential decision-making problems. However, the strict safety requirements make it impossible to explore optimal decisions through online trial and error. Therefore, this article proposes a novel offline RL approach designed to ensure safety, maximize return, and address issues of partially observed states. Specifically, it utilizes an off-policy actor-critic framework to infer the optimal decision from expert operation trajectories. The "actor" in this framework is jointly trained by the supervision and evaluation signals to make decision with low risk and high return. Furthermore, we investigate a recurrent version of the actor and critic networks to better capture the complete observations, which solves the partially observed Markov decision process (POMDP) arising from sensor limitations. Verification within the BF smelting process demonstrates the improvements of the proposed algorithm in performance, i.e., safety and return. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Jiang, Ke Jiang, Zhaohui Jiang, Xudong Xie, Yongfang Gui, Weihua |
format |
Article |
author |
Jiang, Ke Jiang, Zhaohui Jiang, Xudong Xie, Yongfang Gui, Weihua |
author_sort |
Jiang, Ke |
title |
Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations |
title_short |
Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations |
title_full |
Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations |
title_fullStr |
Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations |
title_full_unstemmed |
Reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations |
title_sort |
reinforcement learning for blast furnace ironmaking operation with safety and partial observation considerations |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/177989 |
_version_ |
1806059906075197440 |