FC2: Cloud-based cluster provisioning for distributed machine learning

Training large, complex machine learning models such as deep neural networks with big data requires powerful computing clusters, which are costly to acquire, use and maintain. As a result, many machine learning researchers turn to cloud computing services for on-demand and elastic resource provision...

Full description

Saved in:
Bibliographic Details
Main Author: TA, Nguyen Binh Duong
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4763
https://ink.library.smu.edu.sg/context/sis_research/article/5766/viewcontent/Ta2019_Article_FC2FC2Cloud_basedClusterProvis.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5766
record_format dspace
spelling sg-smu-ink.sis_research-57662020-01-16T10:28:06Z FC2: Cloud-based cluster provisioning for distributed machine learning TA, Nguyen Binh Duong Training large, complex machine learning models such as deep neural networks with big data requires powerful computing clusters, which are costly to acquire, use and maintain. As a result, many machine learning researchers turn to cloud computing services for on-demand and elastic resource provisioning capabilities. Two issues have arisen from this trend: (1) if not configured properly, training models on cloud-based clusters could incur significant cost and time, and (2) many researchers in machine learning tend to focus more on model and algorithm development, so they may not have the time or skills to deal with system setup, resource selection and configuration. In this work, we propose and implement FC2: a system for fast, convenient and cost-effective distributed machine learning over public cloud resources. Central to the effectiveness of FC2 is the ability to recommend an appropriate resource configuration in terms of cost and execution time for a given model training task. Our approach differs from previous work in that it does not need to manually analyze the code and dataset of the training task in advance. The recommended resource configuration can then be deployed and managed automatically by FC2 until the training task is completed. We have conducted extensive experiments with an implementation of FC2, using real-world deep neural network models and datasets. The results demonstrate the effectiveness of our approach, which could produce cost saving of up to 80% while maintaining similar training performance compared to much more expensive resource configurations. 2019-02-08T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4763 info:doi/10.1007%2Fs10586-019-02912-6 https://ink.library.smu.edu.sg/context/sis_research/article/5766/viewcontent/Ta2019_Article_FC2FC2Cloud_basedClusterProvis.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Distributed machine learning Cloud-based clusters Resource recommendation Cluster deployment Computer Engineering Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Distributed machine learning
Cloud-based clusters
Resource recommendation
Cluster deployment
Computer Engineering
Software Engineering
spellingShingle Distributed machine learning
Cloud-based clusters
Resource recommendation
Cluster deployment
Computer Engineering
Software Engineering
TA, Nguyen Binh Duong
FC2: Cloud-based cluster provisioning for distributed machine learning
description Training large, complex machine learning models such as deep neural networks with big data requires powerful computing clusters, which are costly to acquire, use and maintain. As a result, many machine learning researchers turn to cloud computing services for on-demand and elastic resource provisioning capabilities. Two issues have arisen from this trend: (1) if not configured properly, training models on cloud-based clusters could incur significant cost and time, and (2) many researchers in machine learning tend to focus more on model and algorithm development, so they may not have the time or skills to deal with system setup, resource selection and configuration. In this work, we propose and implement FC2: a system for fast, convenient and cost-effective distributed machine learning over public cloud resources. Central to the effectiveness of FC2 is the ability to recommend an appropriate resource configuration in terms of cost and execution time for a given model training task. Our approach differs from previous work in that it does not need to manually analyze the code and dataset of the training task in advance. The recommended resource configuration can then be deployed and managed automatically by FC2 until the training task is completed. We have conducted extensive experiments with an implementation of FC2, using real-world deep neural network models and datasets. The results demonstrate the effectiveness of our approach, which could produce cost saving of up to 80% while maintaining similar training performance compared to much more expensive resource configurations.
format text
author TA, Nguyen Binh Duong
author_facet TA, Nguyen Binh Duong
author_sort TA, Nguyen Binh Duong
title FC2: Cloud-based cluster provisioning for distributed machine learning
title_short FC2: Cloud-based cluster provisioning for distributed machine learning
title_full FC2: Cloud-based cluster provisioning for distributed machine learning
title_fullStr FC2: Cloud-based cluster provisioning for distributed machine learning
title_full_unstemmed FC2: Cloud-based cluster provisioning for distributed machine learning
title_sort fc2: cloud-based cluster provisioning for distributed machine learning
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/4763
https://ink.library.smu.edu.sg/context/sis_research/article/5766/viewcontent/Ta2019_Article_FC2FC2Cloud_basedClusterProvis.pdf
_version_ 1770575024509419520