Design and Analysis of Peer-to-Peer Fault-Tolerance Approach in a Grid Computing System

A grid computing system allows a large complex computing task to efficiently utilize high computing resources by splitting the task into many compute processes to be distributed and executed in parallel at many grid nodes. Under such paradigm, the system fault tolerance is the major issue as the fai...

Full description

Saved in:
Bibliographic Details
Main Authors: Thagorn Tangmankhong, Peerapon Siripongwutikorn, Tiranee Achalakul
Format: บทความวารสาร
Language:English
Published: Science Faculty of Chiang Mai University 2019
Online Access:http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=8043
http://cmuir.cmu.ac.th/jspui/handle/6653943832/63893
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
Language: English
id th-cmuir.6653943832-63893
record_format dspace
spelling th-cmuir.6653943832-638932019-05-07T09:59:37Z Design and Analysis of Peer-to-Peer Fault-Tolerance Approach in a Grid Computing System Thagorn Tangmankhong Peerapon Siripongwutikorn Tiranee Achalakul A grid computing system allows a large complex computing task to efficiently utilize high computing resources by splitting the task into many compute processes to be distributed and executed in parallel at many grid nodes. Under such paradigm, the system fault tolerance is the major issue as the failure of one grid node results in the task failure. Most fault tolerance techniques for a grid computing system are based on periodic savings of checkpoint data, which is used to roll back the system to the last good operating state when the failure occurs. In this paper, the fault tolerance technique based on peer-to-peer replication of checkpoint data is designed and analyzed. The idea is to allow chunks of checkpoint data to be replicated at different backup nodes to facilitate faster recovery time in the failure recovery process. The replication time under the peer-to-peer replication procedure is analyzed to obtain proper choices of chunk size and backup group size. A significant reduction in the recovery time compared to the traditional client-server approach is also gained by using the peer-to-peer replication. 2019-05-07T09:59:37Z 2019-05-07T09:59:37Z 2017 บทความวารสาร 0125-2526 http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=8043 http://cmuir.cmu.ac.th/jspui/handle/6653943832/63893 Eng Science Faculty of Chiang Mai University
institution Chiang Mai University
building Chiang Mai University Library
country Thailand
collection CMU Intellectual Repository
language English
description A grid computing system allows a large complex computing task to efficiently utilize high computing resources by splitting the task into many compute processes to be distributed and executed in parallel at many grid nodes. Under such paradigm, the system fault tolerance is the major issue as the failure of one grid node results in the task failure. Most fault tolerance techniques for a grid computing system are based on periodic savings of checkpoint data, which is used to roll back the system to the last good operating state when the failure occurs. In this paper, the fault tolerance technique based on peer-to-peer replication of checkpoint data is designed and analyzed. The idea is to allow chunks of checkpoint data to be replicated at different backup nodes to facilitate faster recovery time in the failure recovery process. The replication time under the peer-to-peer replication procedure is analyzed to obtain proper choices of chunk size and backup group size. A significant reduction in the recovery time compared to the traditional client-server approach is also gained by using the peer-to-peer replication.
format บทความวารสาร
author Thagorn Tangmankhong
Peerapon Siripongwutikorn
Tiranee Achalakul
spellingShingle Thagorn Tangmankhong
Peerapon Siripongwutikorn
Tiranee Achalakul
Design and Analysis of Peer-to-Peer Fault-Tolerance Approach in a Grid Computing System
author_facet Thagorn Tangmankhong
Peerapon Siripongwutikorn
Tiranee Achalakul
author_sort Thagorn Tangmankhong
title Design and Analysis of Peer-to-Peer Fault-Tolerance Approach in a Grid Computing System
title_short Design and Analysis of Peer-to-Peer Fault-Tolerance Approach in a Grid Computing System
title_full Design and Analysis of Peer-to-Peer Fault-Tolerance Approach in a Grid Computing System
title_fullStr Design and Analysis of Peer-to-Peer Fault-Tolerance Approach in a Grid Computing System
title_full_unstemmed Design and Analysis of Peer-to-Peer Fault-Tolerance Approach in a Grid Computing System
title_sort design and analysis of peer-to-peer fault-tolerance approach in a grid computing system
publisher Science Faculty of Chiang Mai University
publishDate 2019
url http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=8043
http://cmuir.cmu.ac.th/jspui/handle/6653943832/63893
_version_ 1681425980065841152