Fault Tolerance for Parallel Applications through Replication
Based on the technique of replication, an efficient fault-tolerant model for parallel computing on workstation clusters is proposed. The model is built on top of a runtime system which supports resource allocation for parallel applications running on heterogeneous workstation clusters. According to...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
1997
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/1054 http://dx.doi.org/10.1109/ICICS.1997.652234 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Summary: | Based on the technique of replication, an efficient fault-tolerant model for parallel computing on workstation clusters is proposed. The model is built on top of a runtime system which supports resource allocation for parallel applications running on heterogeneous workstation clusters. According to the results of resource allocation, replicated parallel applications can minimize their resource consumption by runtime reconfiguration. Besides, checkpointed states only transfer among replicated applications, no expensive disk read/write operations are therefore required. |
---|