Latent error prediction and fault localization for microservice applications by learning from system trace logs

In the production environment, a large part of microservice failures are related to the complex and dynamic interactions and runtime environments, such as those related to multiple instances, environmental configurations, and asynchronous interactions of microservices. Due to the complexity and dyna...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHOU, Xiang, PENG, Xin, XIE, Tao, SUN, Jun, JI, Chao, LIU, Dewei, XIANG, Qilin, HE, Chuan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4636
https://ink.library.smu.edu.sg/context/sis_research/article/5639/viewcontent/esecfse19_microservice.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5639
record_format dspace
spelling sg-smu-ink.sis_research-56392020-01-02T08:32:08Z Latent error prediction and fault localization for microservice applications by learning from system trace logs ZHOU, Xiang PENG, Xin XIE, Tao SUN, Jun JI, Chao LIU, Dewei XIANG, Qilin HE, Chuan In the production environment, a large part of microservice failures are related to the complex and dynamic interactions and runtime environments, such as those related to multiple instances, environmental configurations, and asynchronous interactions of microservices. Due to the complexity and dynamism of these failures, it is often hard to reproduce and diagnose them in testing environments. It is desirable yet still challenging that these failures can be detected and the faults can be located at runtime of the production environment to allow developers to resolve them efficiently. To address this challenge, in this paper, we propose MEPFL, an approach of latent error prediction and fault localization for microservice applications by learning from system trace logs. Based on a set of features defined on the system trace logs, MEPFL trains prediction models at both the trace level and the microservice level using the system trace logs collected from automatic executions of the target application and its faulty versions produced by fault injection. The prediction models thus can be used in the production environment to predict latent errors, faulty microservices, and fault types for trace instances captured at runtime. We implement MEPFL based on the infrastructure systems of container orchestrator and service mesh, and conduct a series of experimental studies with two opensource microservice applications (one of them being the largest open-source microservice application to our best knowledge). The results indicate that MEPFL can achieve high accuracy in intraapplication prediction of latent errors, faulty microservices, and fault types, and outperforms a state-of-the-art approach of failure diagnosis for distributed systems. The results also show that MEPFL can effectively predict latent errors caused by real-world fault cases. 2019-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4636 https://ink.library.smu.edu.sg/context/sis_research/article/5639/viewcontent/esecfse19_microservice.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Information Security
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Information Security
spellingShingle Information Security
ZHOU, Xiang
PENG, Xin
XIE, Tao
SUN, Jun
JI, Chao
LIU, Dewei
XIANG, Qilin
HE, Chuan
Latent error prediction and fault localization for microservice applications by learning from system trace logs
description In the production environment, a large part of microservice failures are related to the complex and dynamic interactions and runtime environments, such as those related to multiple instances, environmental configurations, and asynchronous interactions of microservices. Due to the complexity and dynamism of these failures, it is often hard to reproduce and diagnose them in testing environments. It is desirable yet still challenging that these failures can be detected and the faults can be located at runtime of the production environment to allow developers to resolve them efficiently. To address this challenge, in this paper, we propose MEPFL, an approach of latent error prediction and fault localization for microservice applications by learning from system trace logs. Based on a set of features defined on the system trace logs, MEPFL trains prediction models at both the trace level and the microservice level using the system trace logs collected from automatic executions of the target application and its faulty versions produced by fault injection. The prediction models thus can be used in the production environment to predict latent errors, faulty microservices, and fault types for trace instances captured at runtime. We implement MEPFL based on the infrastructure systems of container orchestrator and service mesh, and conduct a series of experimental studies with two opensource microservice applications (one of them being the largest open-source microservice application to our best knowledge). The results indicate that MEPFL can achieve high accuracy in intraapplication prediction of latent errors, faulty microservices, and fault types, and outperforms a state-of-the-art approach of failure diagnosis for distributed systems. The results also show that MEPFL can effectively predict latent errors caused by real-world fault cases.
format text
author ZHOU, Xiang
PENG, Xin
XIE, Tao
SUN, Jun
JI, Chao
LIU, Dewei
XIANG, Qilin
HE, Chuan
author_facet ZHOU, Xiang
PENG, Xin
XIE, Tao
SUN, Jun
JI, Chao
LIU, Dewei
XIANG, Qilin
HE, Chuan
author_sort ZHOU, Xiang
title Latent error prediction and fault localization for microservice applications by learning from system trace logs
title_short Latent error prediction and fault localization for microservice applications by learning from system trace logs
title_full Latent error prediction and fault localization for microservice applications by learning from system trace logs
title_fullStr Latent error prediction and fault localization for microservice applications by learning from system trace logs
title_full_unstemmed Latent error prediction and fault localization for microservice applications by learning from system trace logs
title_sort latent error prediction and fault localization for microservice applications by learning from system trace logs
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/4636
https://ink.library.smu.edu.sg/context/sis_research/article/5639/viewcontent/esecfse19_microservice.pdf
_version_ 1770574945333542912