Reliability assessment for distributed systems via communication abstraction and refinement

Distributed systems like cloud-based services are ever more popular. Assessing the reliability of distributed systems is highly non-trivial. Particularly, the order of executions among distributed components adds a dimension of non-determinism, which invalidates existing reliability assessment metho...

Full description

Saved in:
Bibliographic Details
Main Authors: GUI, Lin, SUN, Jun, LIU, Yang, DONG, Jin Song
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2015
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4955
https://ink.library.smu.edu.sg/context/sis_research/article/5958/viewcontent/2771783.2771794.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Distributed systems like cloud-based services are ever more popular. Assessing the reliability of distributed systems is highly non-trivial. Particularly, the order of executions among distributed components adds a dimension of non-determinism, which invalidates existing reliability assessment methods based on Markov chains. Probabilistic model checking based on models like Markov decision processes is designed to deal with scenarios involving both probabilistic behavior (e.g., reliabilities of system components) and non-determinism. However, its application is currently limited by state space explosion, which makes reliability assessment of distributed system particularly difficult. In this work, we improve the probabilistic model checking through a method of abstraction and reduction, which controls the communications among system components and actively reduces the size of each component. We prove the soundness and completeness of the proposed approach. Through an implementation in a software toolkit and evaluations with several systems, we show that our approach often reduces the size of the state space by several orders of magnitude, while still producing sound and accurate assessment.