Zeus: interpretable ML-based job scheduling in GPU datacentres

Hardware accelerators such as GPUs are essential for the development of Deep Learning (DL) models - as their training process is compute-intensive. A growing number of organisations have employed expensive multi-tenant GPU clusters to run distributed DL training jobs. Efficient job schedulers are re...

Full description

Saved in:
Bibliographic Details
Main Author: Amrita, Ravishankar
Other Authors: Zhang Tianwei
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156566
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English