DRAM failure prediction in AIOps: Empirical evaluation, challenges and opportunities

DRAM failure prediction is a vital task in AIOps, which is crucial to maintain the reliability and sustainable service of large-scale data centers. However, limited work has been done on DRAM failure prediction mainly due to the lack of public available datasets. This paper presents a comprehensive...

Full description

Saved in:
Bibliographic Details
Main Authors: WU, Zhiyue, XU, Hongzuo, PANG, Guansong, YU, Fengyuan, WANG, Yijie, JIAN, Songlei, WANG, Yongjun
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7135
https://ink.library.smu.edu.sg/context/sis_research/article/8138/viewcontent/2104.15052.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8138
record_format dspace
spelling sg-smu-ink.sis_research-81382022-04-22T04:29:25Z DRAM failure prediction in AIOps: Empirical evaluation, challenges and opportunities WU, Zhiyue XU, Hongzuo PANG, Guansong YU, Fengyuan WANG, Yijie JIAN, Songlei WANG, Yongjun DRAM failure prediction is a vital task in AIOps, which is crucial to maintain the reliability and sustainable service of large-scale data centers. However, limited work has been done on DRAM failure prediction mainly due to the lack of public available datasets. This paper presents a comprehensive empirical evaluation of diverse machine learning techniques for DRAM failure prediction using a large-scale multisource dataset, including more than three millions of records of kernel, address, and mcelog data, provided by Alibaba Cloud through PAKDD 2021 competition. Particularly, we first formulate the problem as a multiclass classification task and exhaustively evaluate seven popular/stateof-the-art classifiers on both the individual and multiple data sources. We then formulate the problem as an unsupervised anomaly detection task and evaluate three state-of-the-art anomaly detectors. Further, based on the empirical results and our experience of attending this competition, we discuss major challenges and present future research opportunities in this task. 2021-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7135 https://ink.library.smu.edu.sg/context/sis_research/article/8138/viewcontent/2104.15052.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University DRAM failure prediction Data center reliability Cloud services Databases and Information Systems Data Storage Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic DRAM failure prediction
Data center reliability
Cloud services
Databases and Information Systems
Data Storage Systems
spellingShingle DRAM failure prediction
Data center reliability
Cloud services
Databases and Information Systems
Data Storage Systems
WU, Zhiyue
XU, Hongzuo
PANG, Guansong
YU, Fengyuan
WANG, Yijie
JIAN, Songlei
WANG, Yongjun
DRAM failure prediction in AIOps: Empirical evaluation, challenges and opportunities
description DRAM failure prediction is a vital task in AIOps, which is crucial to maintain the reliability and sustainable service of large-scale data centers. However, limited work has been done on DRAM failure prediction mainly due to the lack of public available datasets. This paper presents a comprehensive empirical evaluation of diverse machine learning techniques for DRAM failure prediction using a large-scale multisource dataset, including more than three millions of records of kernel, address, and mcelog data, provided by Alibaba Cloud through PAKDD 2021 competition. Particularly, we first formulate the problem as a multiclass classification task and exhaustively evaluate seven popular/stateof-the-art classifiers on both the individual and multiple data sources. We then formulate the problem as an unsupervised anomaly detection task and evaluate three state-of-the-art anomaly detectors. Further, based on the empirical results and our experience of attending this competition, we discuss major challenges and present future research opportunities in this task.
format text
author WU, Zhiyue
XU, Hongzuo
PANG, Guansong
YU, Fengyuan
WANG, Yijie
JIAN, Songlei
WANG, Yongjun
author_facet WU, Zhiyue
XU, Hongzuo
PANG, Guansong
YU, Fengyuan
WANG, Yijie
JIAN, Songlei
WANG, Yongjun
author_sort WU, Zhiyue
title DRAM failure prediction in AIOps: Empirical evaluation, challenges and opportunities
title_short DRAM failure prediction in AIOps: Empirical evaluation, challenges and opportunities
title_full DRAM failure prediction in AIOps: Empirical evaluation, challenges and opportunities
title_fullStr DRAM failure prediction in AIOps: Empirical evaluation, challenges and opportunities
title_full_unstemmed DRAM failure prediction in AIOps: Empirical evaluation, challenges and opportunities
title_sort dram failure prediction in aiops: empirical evaluation, challenges and opportunities
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/7135
https://ink.library.smu.edu.sg/context/sis_research/article/8138/viewcontent/2104.15052.pdf
_version_ 1770576229565464576