Joint IT-facility optimization for green data centers via deep reinforcement learning

The data center market grows rapidly with the increase of data and its corresponding applications (e.g., machine learning, cloud storage, Internet of Things, and so on). The growth is boosted recently due to the shift of activities online during the COVID-19 pandemic. Reducing the energy consumption...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhou, Xin, Wang, Ruihang, Wen, Yonggang, Tan, Rui
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/158611
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The data center market grows rapidly with the increase of data and its corresponding applications (e.g., machine learning, cloud storage, Internet of Things, and so on). The growth is boosted recently due to the shift of activities online during the COVID-19 pandemic. Reducing the energy consumption of data centers faces various challenges that are further aggravated by the tropical conditions with high temperature and humidity in the tropics like Singapore. The prevailing siloed approach of operating the information technology (IT) and the facility systems separately has resulted in wasteful over-provisioning. The recently proposed approaches for energy usage minimization under various constraints including thermal safety scale poorly with the data center size and often result in non-optimal solutions. To advance the state of the art, we apply deep reinforcement learning (DRL) to address the scalability problem and achieve optimality over a long time horizon in reducing data center energy usage. In particular, we deploy the data-driven deep model and physical rule based model in lieu of the physical data center during the training and validation phases to manage the thermal safety risks caused by DRL's strategy of learning from errors.