THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING

Distributed deep learning is a method in machine learning that is used for complex and time-consuming feature extraction. One of the frameworks that is used to perform distributed machine learning is AdaptDL. AdaptDL runs machine learning processes on top of a Kubernetes cluster using the Poll...

Full description

Saved in:

Bibliographic Details
Main Author:	Aptanagi, Pandyaka
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/55924
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:55924
spelling	id-itb.:559242021-06-20T08:02:30ZTHE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING Aptanagi, Pandyaka Indonesia Final Project Kubernetes, AdaptDL, Pollux, deep learning, hybrid resource scheduler INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/55924 Distributed deep learning is a method in machine learning that is used for complex and time-consuming feature extraction. One of the frameworks that is used to perform distributed machine learning is AdaptDL. AdaptDL runs machine learning processes on top of a Kubernetes cluster using the Pollux scheduler system. In determining the scheduling decisions, Pollux only provides one performance metric, namely the Goodput metric, and does not provide any other options. In addition, Pollux also has the potential to maximize the training speed by changing the Goodput value, as well as the potential to streamline resources by changing the threshold value for determining the scaling mechanism. In this research, AdaptDL was developed by adding performance metrics options, metrics to maximize speed, and metrics to increase resources efficiency. The options for performance metrics were implemented in the AdaptDL framework, the speed metric was carried out by modifying the Goodput equation, and the efficiency metric was implemented by modifying the Pollux. Based on the test results using the image recognition classification model on the MNIST dataset, development and modification did not affect the accuracy of the resulting model but affect other aspects of performance. Adding options for performance metrics did not affect overall learning performance. Modifications to the metric for speed affected the training speed so that it slowed down by 16.099%. Meanwhile, modifications to the metric for resource efficiency affected the training speed so that it slowed down by 106.977%, resource generation time increased by 80%, and resource usage increased by 19.31% text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Distributed deep learning is a method in machine learning that is used for complex and time-consuming feature extraction. One of the frameworks that is used to perform distributed machine learning is AdaptDL. AdaptDL runs machine learning processes on top of a Kubernetes cluster using the Pollux scheduler system. In determining the scheduling decisions, Pollux only provides one performance metric, namely the Goodput metric, and does not provide any other options. In addition, Pollux also has the potential to maximize the training speed by changing the Goodput value, as well as the potential to streamline resources by changing the threshold value for determining the scaling mechanism. In this research, AdaptDL was developed by adding performance metrics options, metrics to maximize speed, and metrics to increase resources efficiency. The options for performance metrics were implemented in the AdaptDL framework, the speed metric was carried out by modifying the Goodput equation, and the efficiency metric was implemented by modifying the Pollux. Based on the test results using the image recognition classification model on the MNIST dataset, development and modification did not affect the accuracy of the resulting model but affect other aspects of performance. Adding options for performance metrics did not affect overall learning performance. Modifications to the metric for speed affected the training speed so that it slowed down by 16.099%. Meanwhile, modifications to the metric for resource efficiency affected the training speed so that it slowed down by 106.977%, resource generation time increased by 80%, and resource usage increased by 19.31%
format	Final Project
author	Aptanagi, Pandyaka
spellingShingle	Aptanagi, Pandyaka THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
author_facet	Aptanagi, Pandyaka
author_sort	Aptanagi, Pandyaka
title	THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
title_short	THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
title_full	THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
title_fullStr	THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
title_full_unstemmed	THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING
title_sort	effect of performance metrics modification on hybrid resource scheduler for distributed deep learning training
url	https://digilib.itb.ac.id/gdl/view/55924
_version_	1822930043213447168

THE EFFECT OF PERFORMANCE METRICS MODIFICATION ON HYBRID RESOURCE SCHEDULER FOR DISTRIBUTED DEEP LEARNING TRAINING

Similar Items