THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING

Distributed deep learning is a method in machine learning that is used for complex and time-consuming feature extraction. One of the frameworks that is used to perform distributed machine learning is AdaptDL. AdaptDL runs machine learning processes on top of a Kubernetes cluster using the Poll...

Full description

Saved in:
Bibliographic Details
Main Author: Aptanagi, Pandyaka
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/55924
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:55924
spelling id-itb.:559242021-06-20T08:02:30ZTHE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING Aptanagi, Pandyaka Indonesia Final Project Kubernetes, AdaptDL, Pollux, deep learning, hybrid resource scheduler INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/55924 Distributed deep learning is a method in machine learning that is used for complex and time-consuming feature extraction. One of the frameworks that is used to perform distributed machine learning is AdaptDL. AdaptDL runs machine learning processes on top of a Kubernetes cluster using the Pollux scheduler system. In determining the scheduling decisions, Pollux only provides one performance metric, namely the Goodput metric, and does not provide any other options. In addition, Pollux also has the potential to maximize the training speed by changing the Goodput value, as well as the potential to streamline resources by changing the threshold value for determining the scaling mechanism. In this research, AdaptDL was developed by adding performance metrics options, metrics to maximize speed, and metrics to increase resources efficiency. The options for performance metrics were implemented in the AdaptDL framework, the speed metric was carried out by modifying the Goodput equation, and the efficiency metric was implemented by modifying the Pollux. Based on the test results using the image recognition classification model on the MNIST dataset, development and modification did not affect the accuracy of the resulting model but affect other aspects of performance. Adding options for performance metrics did not affect overall learning performance. Modifications to the metric for speed affected the training speed so that it slowed down by 16.099%. Meanwhile, modifications to the metric for resource efficiency affected the training speed so that it slowed down by 106.977%, resource generation time increased by 80%, and resource usage increased by 19.31% text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Distributed deep learning is a method in machine learning that is used for complex and time-consuming feature extraction. One of the frameworks that is used to perform distributed machine learning is AdaptDL. AdaptDL runs machine learning processes on top of a Kubernetes cluster using the Pollux scheduler system. In determining the scheduling decisions, Pollux only provides one performance metric, namely the Goodput metric, and does not provide any other options. In addition, Pollux also has the potential to maximize the training speed by changing the Goodput value, as well as the potential to streamline resources by changing the threshold value for determining the scaling mechanism. In this research, AdaptDL was developed by adding performance metrics options, metrics to maximize speed, and metrics to increase resources efficiency. The options for performance metrics were implemented in the AdaptDL framework, the speed metric was carried out by modifying the Goodput equation, and the efficiency metric was implemented by modifying the Pollux. Based on the test results using the image recognition classification model on the MNIST dataset, development and modification did not affect the accuracy of the resulting model but affect other aspects of performance. Adding options for performance metrics did not affect overall learning performance. Modifications to the metric for speed affected the training speed so that it slowed down by 16.099%. Meanwhile, modifications to the metric for resource efficiency affected the training speed so that it slowed down by 106.977%, resource generation time increased by 80%, and resource usage increased by 19.31%
format Final Project
author Aptanagi, Pandyaka
spellingShingle Aptanagi, Pandyaka
THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
author_facet Aptanagi, Pandyaka
author_sort Aptanagi, Pandyaka
title THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
title_short THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
title_full THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
title_fullStr THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
title_full_unstemmed THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
title_sort effect of performance metrics modification on hybrid resource scheduler for distributed deep learning training
url https://digilib.itb.ac.id/gdl/view/55924
_version_ 1822930043213447168