Fault Tolerance for Parallel Applications through Replication

Based on the technique of replication, an efficient fault-tolerant model for parallel computing on workstation clusters is proposed. The model is built on top of a runtime system which supports resource allocation for parallel applications running on heterogeneous workstation clusters. According to...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: SHUM, Kam Hong
التنسيق: text
اللغة:English
منشور في: Institutional Knowledge at Singapore Management University 1997
الموضوعات:
الوصول للمادة أونلاين:https://ink.library.smu.edu.sg/sis_research/1054
http://dx.doi.org/10.1109/ICICS.1997.652234
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Singapore Management University
اللغة: English
الوصف
الملخص:Based on the technique of replication, an efficient fault-tolerant model for parallel computing on workstation clusters is proposed. The model is built on top of a runtime system which supports resource allocation for parallel applications running on heterogeneous workstation clusters. According to the results of resource allocation, replicated parallel applications can minimize their resource consumption by runtime reconfiguration. Besides, checkpointed states only transfer among replicated applications, no expensive disk read/write operations are therefore required.