Distributed machine learning on IAAS clouds

Training complex machine learning (ML) models with large datasets requires powerful computing infrastructure, which is costly to acquire and maintain. As a result, ML researchers turn to the cloud for on-demand and elastic resource provisioning capabilities. Two issues have arisen from this trend: 1...

Full description

Saved in:
Bibliographic Details
Main Authors: TA, Nguyen Binh Duong, NGUYEN, Quang Sang
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4832
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Training complex machine learning (ML) models with large datasets requires powerful computing infrastructure, which is costly to acquire and maintain. As a result, ML researchers turn to the cloud for on-demand and elastic resource provisioning capabilities. Two issues have arisen from this trend: 1) if not configured properly, training ML models on the cloud could incur significant cost and time, and 2) many researchers in ML tend to focus more on model and algorithm development, so they may not have enough time or skills to deal with system setup, resource selection and configuration. In this work, we propose and implement FC 2 : a web service for fast, convenient and cost-effective distributed ML model training over public cloud resource. Central to the effectiveness of FC 2 is the ability to recommend an appropriate resource configuration in terms of cost and execution time for a given ML training task. Extensive experiments with real-world deep neural network models and datasets demonstrate the effectiveness of our solution.