WORKER BALANCING IMPLEMENTATION ON DRAGON SCHEDULER FOR DISTRIBUTED DEEP LEARNING IN KUBERNETES

Deep learning is generally have much more computational process than conventional machine learning, so it requires a lot of time for the training process. Distributed deep learning is an alternative approach to reduce training time by distributing the computational load across multiple machines....

Full description

Saved in:
Bibliographic Details
Main Author: Prima Yoriko, Naufal
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/65753
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:65753
spelling id-itb.:657532022-06-24T14:59:25ZWORKER BALANCING IMPLEMENTATION ON DRAGON SCHEDULER FOR DISTRIBUTED DEEP LEARNING IN KUBERNETES Prima Yoriko, Naufal Indonesia Final Project DRAGON scheduler, worker balancing, deep learning, parameter server, job scheduling, Kubernetes, Tensorflow INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/65753 Deep learning is generally have much more computational process than conventional machine learning, so it requires a lot of time for the training process. Distributed deep learning is an alternative approach to reduce training time by distributing the computational load across multiple machines. DRAGON scheduler is a scheduler that is used to schedule various distributed training tasks using parameter server architecture with Tensorflow on a Kubernetes cluster. The DRAGON scheduler has the advantage of being able to scale the number of workers from a training job depending on the availability of resources in the cluster. Based on the implementation of scaling on the DRAGON scheduler, the process of adding and subtracting workers is focused on one job first. However, it was found that the implementation being inefficient in terms of training duration because a limitation of parameter server architecture. So, due to these limitations, it is necessary to modify the scaling process in the DRAGON scheduler by implementing worker balancing, which was implemented in this Final Project. In the DRAGON scheduler that is modified using worker balancing, the duration of the training can be reduced by 16.305% while maintaining the accuracy of the prediction of training results. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Deep learning is generally have much more computational process than conventional machine learning, so it requires a lot of time for the training process. Distributed deep learning is an alternative approach to reduce training time by distributing the computational load across multiple machines. DRAGON scheduler is a scheduler that is used to schedule various distributed training tasks using parameter server architecture with Tensorflow on a Kubernetes cluster. The DRAGON scheduler has the advantage of being able to scale the number of workers from a training job depending on the availability of resources in the cluster. Based on the implementation of scaling on the DRAGON scheduler, the process of adding and subtracting workers is focused on one job first. However, it was found that the implementation being inefficient in terms of training duration because a limitation of parameter server architecture. So, due to these limitations, it is necessary to modify the scaling process in the DRAGON scheduler by implementing worker balancing, which was implemented in this Final Project. In the DRAGON scheduler that is modified using worker balancing, the duration of the training can be reduced by 16.305% while maintaining the accuracy of the prediction of training results.
format Final Project
author Prima Yoriko, Naufal
spellingShingle Prima Yoriko, Naufal
WORKER BALANCING IMPLEMENTATION ON DRAGON SCHEDULER FOR DISTRIBUTED DEEP LEARNING IN KUBERNETES
author_facet Prima Yoriko, Naufal
author_sort Prima Yoriko, Naufal
title WORKER BALANCING IMPLEMENTATION ON DRAGON SCHEDULER FOR DISTRIBUTED DEEP LEARNING IN KUBERNETES
title_short WORKER BALANCING IMPLEMENTATION ON DRAGON SCHEDULER FOR DISTRIBUTED DEEP LEARNING IN KUBERNETES
title_full WORKER BALANCING IMPLEMENTATION ON DRAGON SCHEDULER FOR DISTRIBUTED DEEP LEARNING IN KUBERNETES
title_fullStr WORKER BALANCING IMPLEMENTATION ON DRAGON SCHEDULER FOR DISTRIBUTED DEEP LEARNING IN KUBERNETES
title_full_unstemmed WORKER BALANCING IMPLEMENTATION ON DRAGON SCHEDULER FOR DISTRIBUTED DEEP LEARNING IN KUBERNETES
title_sort worker balancing implementation on dragon scheduler for distributed deep learning in kubernetes
url https://digilib.itb.ac.id/gdl/view/65753
_version_ 1822932842453139456