THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
Distributed deep learning is a method in machine learning that is used for complex and time-consuming feature extraction. One of the frameworks that is used to perform distributed machine learning is AdaptDL. AdaptDL runs machine learning processes on top of a Kubernetes cluster using the Poll...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/55924 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:55924 |
---|---|
spelling |
id-itb.:559242021-06-20T08:02:30ZTHE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING Aptanagi, Pandyaka Indonesia Final Project Kubernetes, AdaptDL, Pollux, deep learning, hybrid resource scheduler INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/55924 Distributed deep learning is a method in machine learning that is used for complex and time-consuming feature extraction. One of the frameworks that is used to perform distributed machine learning is AdaptDL. AdaptDL runs machine learning processes on top of a Kubernetes cluster using the Pollux scheduler system. In determining the scheduling decisions, Pollux only provides one performance metric, namely the Goodput metric, and does not provide any other options. In addition, Pollux also has the potential to maximize the training speed by changing the Goodput value, as well as the potential to streamline resources by changing the threshold value for determining the scaling mechanism. In this research, AdaptDL was developed by adding performance metrics options, metrics to maximize speed, and metrics to increase resources efficiency. The options for performance metrics were implemented in the AdaptDL framework, the speed metric was carried out by modifying the Goodput equation, and the efficiency metric was implemented by modifying the Pollux. Based on the test results using the image recognition classification model on the MNIST dataset, development and modification did not affect the accuracy of the resulting model but affect other aspects of performance. Adding options for performance metrics did not affect overall learning performance. Modifications to the metric for speed affected the training speed so that it slowed down by 16.099%. Meanwhile, modifications to the metric for resource efficiency affected the training speed so that it slowed down by 106.977%, resource generation time increased by 80%, and resource usage increased by 19.31% text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Distributed deep learning is a method in machine learning that is used for complex
and time-consuming feature extraction. One of the frameworks that is used to
perform distributed machine learning is AdaptDL. AdaptDL runs machine learning
processes on top of a Kubernetes cluster using the Pollux scheduler system. In
determining the scheduling decisions, Pollux only provides one performance
metric, namely the Goodput metric, and does not provide any other options. In
addition, Pollux also has the potential to maximize the training speed by changing
the Goodput value, as well as the potential to streamline resources by changing the
threshold value for determining the scaling mechanism. In this research, AdaptDL
was developed by adding performance metrics options, metrics to maximize speed,
and metrics to increase resources efficiency. The options for performance metrics
were implemented in the AdaptDL framework, the speed metric was carried out by
modifying the Goodput equation, and the efficiency metric was implemented by
modifying the Pollux. Based on the test results using the image recognition
classification model on the MNIST dataset, development and modification did not
affect the accuracy of the resulting model but affect other aspects of performance.
Adding options for performance metrics did not affect overall learning
performance. Modifications to the metric for speed affected the training speed so
that it slowed down by 16.099%. Meanwhile, modifications to the metric for
resource efficiency affected the training speed so that it slowed down by 106.977%,
resource generation time increased by 80%, and resource usage increased by
19.31% |
format |
Final Project |
author |
Aptanagi, Pandyaka |
spellingShingle |
Aptanagi, Pandyaka THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING |
author_facet |
Aptanagi, Pandyaka |
author_sort |
Aptanagi, Pandyaka |
title |
THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING |
title_short |
THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING |
title_full |
THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING |
title_fullStr |
THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING |
title_full_unstemmed |
THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING |
title_sort |
effect of performance metrics modification on hybrid resource scheduler for distributed deep learning training |
url |
https://digilib.itb.ac.id/gdl/view/55924 |
_version_ |
1822930043213447168 |