Physics-informed machine learning for green data center operations

The data center (DC) industry is rapidly growing in recent years to meet the ever-increasing cloud computing and storage demands. The dramatically increasing DC scale brings substantial challenges for DC operations that aim to maintain business continuity and reduce operating costs. Current DCs are...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Ruihang
Other Authors: Tan Rui
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172421
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-172421
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Wang, Ruihang
Physics-informed machine learning for green data center operations
description The data center (DC) industry is rapidly growing in recent years to meet the ever-increasing cloud computing and storage demands. The dramatically increasing DC scale brings substantial challenges for DC operations that aim to maintain business continuity and reduce operating costs. Current DCs are mostly operated in a reactive manner in that they adopt feedback controllers and rely on empirical best practices. However, existing operating principles only focus on maintaining temperatures within certain ranges without taking system power usage into account. To achieve low-power DC operations, proactive and intelligent solutions are highly desirable. Machine learning (ML) approaches based on deep neural networks have been considered for developing such solutions. However, applying these advanced ML algorithms to DC operations faces two major challenges. First, ML often requires a large volume of training data, including those from abnormal cases which are hard to obtain from a stably operated DC. Second, the prevailing risk-aversion mindset in the DC industry hinders the wide deployment of ML-based solutions. To unleash the potential of ML for DC operations, this thesis proposes to integrate DC's ``physics priors" into the learning and deployment of the ML algorithms. The proposed physics-informed ML solutions advance DC operations in the following three stages. Firstly, this thesis aims to build predictive models to characterize the thermodynamics and power usage of a DC. To improve the model accuracy and reduce computation overhead, the thesis first proposes a knowledge-based model calibration and reduction approach for data hall thermodynamics model optimization. The evaluation shows the method achieves sub-1C temperature prediction error while accelerating the simulations by thousand times. Secondly, this thesis develops prescriptive models to instruct the DC cooling control with ML-based techniques. To address the challenges of enforcing thermal safety constraints during state exploration, this thesis designs a physics-guided learning framework that applies offline imitation learning and online post-hoc rectification to prevent thermal unsafety. In particular, the post-hoc rectification searches for the minimum modification to the ML-recommended action such that the rectified action will not result in thermal unsafety. The rectification is designed based on the previously calibrated thermodynamics models. The evaluation shows the proposed approach saves 14% to 26% power usage compared with conventional feedback control while satisfying safety constraints during the ML training. Thirdly, this thesis adapts the ML-based policy to the evolving DC environment. To expedite the adaptation with safety considerations, this thesis develops a physics-informed lifelong learning approach by supervising data collection with the previously identified transition model, fitting power usage and residual thermal models, pretraining the agent by interacting with these models, and deploying the agent for further fine-tuning. The proposed approach uses known physical laws to inform the modeling of transition and power usage for improving the extrapolation ability to unseen states. The evaluation shows that our approach saves 5.7% to 13.8% power usage compared with conventional feedback control and adapts 8x to 10x faster than native fine-tuning with at most 0.74C temperature overshoot. In summary, the proposed solutions that integrate state-of-the-art ML algorithms and physics priors can accurately simulate a DC and optimize it to achieve intelligent and low-power operations.
author2 Tan Rui
author_facet Tan Rui
Wang, Ruihang
format Thesis-Doctor of Philosophy
author Wang, Ruihang
author_sort Wang, Ruihang
title Physics-informed machine learning for green data center operations
title_short Physics-informed machine learning for green data center operations
title_full Physics-informed machine learning for green data center operations
title_fullStr Physics-informed machine learning for green data center operations
title_full_unstemmed Physics-informed machine learning for green data center operations
title_sort physics-informed machine learning for green data center operations
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/172421
_version_ 1787590739552632832
spelling sg-ntu-dr.10356-1724212024-01-04T06:32:51Z Physics-informed machine learning for green data center operations Wang, Ruihang Tan Rui Wen Yonggang School of Computer Science and Engineering tanrui@ntu.edu.sg, YGWEN@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence The data center (DC) industry is rapidly growing in recent years to meet the ever-increasing cloud computing and storage demands. The dramatically increasing DC scale brings substantial challenges for DC operations that aim to maintain business continuity and reduce operating costs. Current DCs are mostly operated in a reactive manner in that they adopt feedback controllers and rely on empirical best practices. However, existing operating principles only focus on maintaining temperatures within certain ranges without taking system power usage into account. To achieve low-power DC operations, proactive and intelligent solutions are highly desirable. Machine learning (ML) approaches based on deep neural networks have been considered for developing such solutions. However, applying these advanced ML algorithms to DC operations faces two major challenges. First, ML often requires a large volume of training data, including those from abnormal cases which are hard to obtain from a stably operated DC. Second, the prevailing risk-aversion mindset in the DC industry hinders the wide deployment of ML-based solutions. To unleash the potential of ML for DC operations, this thesis proposes to integrate DC's ``physics priors" into the learning and deployment of the ML algorithms. The proposed physics-informed ML solutions advance DC operations in the following three stages. Firstly, this thesis aims to build predictive models to characterize the thermodynamics and power usage of a DC. To improve the model accuracy and reduce computation overhead, the thesis first proposes a knowledge-based model calibration and reduction approach for data hall thermodynamics model optimization. The evaluation shows the method achieves sub-1C temperature prediction error while accelerating the simulations by thousand times. Secondly, this thesis develops prescriptive models to instruct the DC cooling control with ML-based techniques. To address the challenges of enforcing thermal safety constraints during state exploration, this thesis designs a physics-guided learning framework that applies offline imitation learning and online post-hoc rectification to prevent thermal unsafety. In particular, the post-hoc rectification searches for the minimum modification to the ML-recommended action such that the rectified action will not result in thermal unsafety. The rectification is designed based on the previously calibrated thermodynamics models. The evaluation shows the proposed approach saves 14% to 26% power usage compared with conventional feedback control while satisfying safety constraints during the ML training. Thirdly, this thesis adapts the ML-based policy to the evolving DC environment. To expedite the adaptation with safety considerations, this thesis develops a physics-informed lifelong learning approach by supervising data collection with the previously identified transition model, fitting power usage and residual thermal models, pretraining the agent by interacting with these models, and deploying the agent for further fine-tuning. The proposed approach uses known physical laws to inform the modeling of transition and power usage for improving the extrapolation ability to unseen states. The evaluation shows that our approach saves 5.7% to 13.8% power usage compared with conventional feedback control and adapts 8x to 10x faster than native fine-tuning with at most 0.74C temperature overshoot. In summary, the proposed solutions that integrate state-of-the-art ML algorithms and physics priors can accurately simulate a DC and optimize it to achieve intelligent and low-power operations. Doctor of Philosophy 2023-12-11T03:00:28Z 2023-12-11T03:00:28Z 2023 Thesis-Doctor of Philosophy Wang, R. (2023). Physics-informed machine learning for green data center operations. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/172421 https://hdl.handle.net/10356/172421 10.32657/10356/172421 en NRF2020NRF-CG001-027 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University