Fault-tolerant computation meets network coding: optimal scheduling in parallel computing

In large-scale parallel computing systems, machines and the network suffer from non-negligible faults, often leading to system crashes. The traditional method to increase reliability is to restart the failed jobs. To avoid unnecessary time wasted on reboots, we propose an optimal scheduling strategy...

全面介紹

Saved in:
書目詳細資料
Main Authors: Li, Congduan, Zhang, Yiqian, Tan, Chee Wei
其他作者: School of Computer Science and Engineering
格式: Article
語言:English
出版: 2023
主題:
在線閱讀:https://hdl.handle.net/10356/172081
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!