Physics-informed machine learning for green data center operations
The data center (DC) industry is rapidly growing in recent years to meet the ever-increasing cloud computing and storage demands. The dramatically increasing DC scale brings substantial challenges for DC operations that aim to maintain business continuity and reduce operating costs. Current DCs are...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172421 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-172421 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Wang, Ruihang Physics-informed machine learning for green data center operations |
description |
The data center (DC) industry is rapidly growing in recent years to meet the ever-increasing cloud computing and storage demands. The dramatically increasing DC scale brings substantial challenges for DC operations that aim to maintain business continuity and reduce operating costs. Current DCs are mostly operated in a reactive manner in that they adopt feedback controllers and rely on empirical best practices. However, existing operating principles only focus on maintaining temperatures within certain ranges without taking system power usage into account. To achieve low-power DC operations, proactive and intelligent solutions are highly desirable. Machine learning (ML) approaches based on deep neural networks have been considered for developing such solutions. However, applying these advanced ML algorithms to DC operations faces two major challenges. First, ML often requires a large volume of training data, including those from abnormal cases which are hard to obtain from a stably operated DC. Second, the prevailing risk-aversion mindset in the DC industry hinders the wide deployment of ML-based solutions. To unleash the potential of ML for DC operations, this thesis proposes to integrate DC's ``physics priors" into the learning and deployment of the ML algorithms. The proposed physics-informed ML solutions advance DC operations in the following three stages.
Firstly, this thesis aims to build predictive models to characterize the thermodynamics and power usage of a DC. To improve the model accuracy and reduce computation overhead, the thesis first proposes a knowledge-based model calibration and reduction approach for data hall thermodynamics model optimization. The evaluation shows the method achieves sub-1C temperature prediction error while accelerating the simulations by thousand times.
Secondly, this thesis develops prescriptive models to instruct the DC cooling control with ML-based techniques. To address the challenges of enforcing thermal safety constraints during state exploration, this thesis designs a physics-guided learning framework that applies offline imitation learning and online post-hoc rectification to prevent thermal unsafety. In particular, the post-hoc rectification searches for the minimum modification to the ML-recommended action such that the rectified action will not result in thermal unsafety. The rectification is designed based on the previously calibrated thermodynamics models. The evaluation shows the proposed approach saves 14% to 26% power usage compared with conventional feedback control while satisfying safety constraints during the ML training.
Thirdly, this thesis adapts the ML-based policy to the evolving DC environment. To expedite the adaptation with safety considerations, this thesis develops a physics-informed lifelong learning approach by supervising data collection with the previously identified transition model, fitting power usage and residual thermal models, pretraining the agent by interacting with these models, and deploying the agent for further fine-tuning. The proposed approach uses known physical laws to inform the modeling of transition and power usage for improving the extrapolation ability to unseen states. The evaluation shows that our approach saves 5.7% to 13.8% power usage compared with conventional feedback control and adapts 8x to 10x faster than native fine-tuning with at most 0.74C temperature overshoot.
In summary, the proposed solutions that integrate state-of-the-art ML algorithms and physics priors can accurately simulate a DC and optimize it to achieve intelligent and low-power operations. |
author2 |
Tan Rui |
author_facet |
Tan Rui Wang, Ruihang |
format |
Thesis-Doctor of Philosophy |
author |
Wang, Ruihang |
author_sort |
Wang, Ruihang |
title |
Physics-informed machine learning for green data center operations |
title_short |
Physics-informed machine learning for green data center operations |
title_full |
Physics-informed machine learning for green data center operations |
title_fullStr |
Physics-informed machine learning for green data center operations |
title_full_unstemmed |
Physics-informed machine learning for green data center operations |
title_sort |
physics-informed machine learning for green data center operations |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/172421 |
_version_ |
1787590739552632832 |
spelling |
sg-ntu-dr.10356-1724212024-01-04T06:32:51Z Physics-informed machine learning for green data center operations Wang, Ruihang Tan Rui Wen Yonggang School of Computer Science and Engineering tanrui@ntu.edu.sg, YGWEN@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence The data center (DC) industry is rapidly growing in recent years to meet the ever-increasing cloud computing and storage demands. The dramatically increasing DC scale brings substantial challenges for DC operations that aim to maintain business continuity and reduce operating costs. Current DCs are mostly operated in a reactive manner in that they adopt feedback controllers and rely on empirical best practices. However, existing operating principles only focus on maintaining temperatures within certain ranges without taking system power usage into account. To achieve low-power DC operations, proactive and intelligent solutions are highly desirable. Machine learning (ML) approaches based on deep neural networks have been considered for developing such solutions. However, applying these advanced ML algorithms to DC operations faces two major challenges. First, ML often requires a large volume of training data, including those from abnormal cases which are hard to obtain from a stably operated DC. Second, the prevailing risk-aversion mindset in the DC industry hinders the wide deployment of ML-based solutions. To unleash the potential of ML for DC operations, this thesis proposes to integrate DC's ``physics priors" into the learning and deployment of the ML algorithms. The proposed physics-informed ML solutions advance DC operations in the following three stages. Firstly, this thesis aims to build predictive models to characterize the thermodynamics and power usage of a DC. To improve the model accuracy and reduce computation overhead, the thesis first proposes a knowledge-based model calibration and reduction approach for data hall thermodynamics model optimization. The evaluation shows the method achieves sub-1C temperature prediction error while accelerating the simulations by thousand times. Secondly, this thesis develops prescriptive models to instruct the DC cooling control with ML-based techniques. To address the challenges of enforcing thermal safety constraints during state exploration, this thesis designs a physics-guided learning framework that applies offline imitation learning and online post-hoc rectification to prevent thermal unsafety. In particular, the post-hoc rectification searches for the minimum modification to the ML-recommended action such that the rectified action will not result in thermal unsafety. The rectification is designed based on the previously calibrated thermodynamics models. The evaluation shows the proposed approach saves 14% to 26% power usage compared with conventional feedback control while satisfying safety constraints during the ML training. Thirdly, this thesis adapts the ML-based policy to the evolving DC environment. To expedite the adaptation with safety considerations, this thesis develops a physics-informed lifelong learning approach by supervising data collection with the previously identified transition model, fitting power usage and residual thermal models, pretraining the agent by interacting with these models, and deploying the agent for further fine-tuning. The proposed approach uses known physical laws to inform the modeling of transition and power usage for improving the extrapolation ability to unseen states. The evaluation shows that our approach saves 5.7% to 13.8% power usage compared with conventional feedback control and adapts 8x to 10x faster than native fine-tuning with at most 0.74C temperature overshoot. In summary, the proposed solutions that integrate state-of-the-art ML algorithms and physics priors can accurately simulate a DC and optimize it to achieve intelligent and low-power operations. Doctor of Philosophy 2023-12-11T03:00:28Z 2023-12-11T03:00:28Z 2023 Thesis-Doctor of Philosophy Wang, R. (2023). Physics-informed machine learning for green data center operations. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/172421 https://hdl.handle.net/10356/172421 10.32657/10356/172421 en NRF2020NRF-CG001-027 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |