Fault-tolerant computation meets network coding: optimal scheduling in parallel computing
In large-scale parallel computing systems, machines and the network suffer from non-negligible faults, often leading to system crashes. The traditional method to increase reliability is to restart the failed jobs. To avoid unnecessary time wasted on reboots, we propose an optimal scheduling strategy...
Saved in:
Main Authors: | Li, Congduan, Zhang, Yiqian, Tan, Chee Wei |
---|---|
Other Authors: | School of Computer Science and Engineering |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172081 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
ASSESSING FAULT-TOLERANCE CONDITIONS FOR SURFACE CODE IMPLEMENTED WITH NOISY DEVICES
by: CHAI JING HAO
Published: (2020) -
Effective Fault Tolerance for Agent-Based Cluster Computing
by: SHUM, Kam Hong
Published: (1999) -
Fault-tolerant scheduling for differentiated classes of tasks with low replication cost in computational grids
by: Zheng, Q., et al.
Published: (2014) -
Fault-tolerant strategies for multi-rotor parcel delivery
by: Tan, Jun Kiat
Published: (2024) -
Fault-tolerant multicast routing protocol for real-time traffic on the internet
by: Dechanuchit Katanyutaveetip
Published: (2006)