Distributed machine learning on public clouds
Machine learning (ML) aims to construct predictive models from example input data. Conventional ML systems like Caffe could have acceptable model training time on a single machine when dealing with a moderate amount of data. However, they may not be able to cope with very large training data sets, s...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/76892 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-76892 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-768922023-03-03T20:54:27Z Distributed machine learning on public clouds Lim, Ernest Woon Teng Ta Nguyen Binh Duong School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering Machine learning (ML) aims to construct predictive models from example input data. Conventional ML systems like Caffe could have acceptable model training time on a single machine when dealing with a moderate amount of data. However, they may not be able to cope with very large training data sets, such as ImageNet and Yahoo News Feed, which could have hundreds of millions of records. Several distributed ML systems have been proposed to reduce model training time. However, the behaviors of these systems on heterogeneous infrastructures such as public cloud infrastructures, e.g., Amazon EC2, Google GCE or Windows Azure, have not been thoroughly investigated. In this project, we will examine the performance of popular distributed ML systems such as Distributed Tensorflow and Horovod on Amazon Web Services. Bachelor of Engineering (Computer Science) 2019-04-22T13:06:58Z 2019-04-22T13:06:58Z 2019 Final Year Project (FYP) http://hdl.handle.net/10356/76892 en Nanyang Technological University 33 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering |
spellingShingle |
DRNTU::Engineering::Computer science and engineering Lim, Ernest Woon Teng Distributed machine learning on public clouds |
description |
Machine learning (ML) aims to construct predictive models from example input data. Conventional ML systems like Caffe could have acceptable model training time on a single machine when dealing with a moderate amount of data. However, they may not be able to cope with very large training data sets, such as ImageNet and Yahoo News Feed, which could have hundreds of millions of records. Several distributed ML systems have been proposed to reduce model training time. However, the behaviors of these systems on heterogeneous infrastructures such as public cloud infrastructures, e.g., Amazon EC2, Google GCE or Windows Azure, have not been thoroughly investigated. In this project, we will examine the performance of popular distributed ML systems such as Distributed Tensorflow and Horovod on Amazon Web Services. |
author2 |
Ta Nguyen Binh Duong |
author_facet |
Ta Nguyen Binh Duong Lim, Ernest Woon Teng |
format |
Final Year Project |
author |
Lim, Ernest Woon Teng |
author_sort |
Lim, Ernest Woon Teng |
title |
Distributed machine learning on public clouds |
title_short |
Distributed machine learning on public clouds |
title_full |
Distributed machine learning on public clouds |
title_fullStr |
Distributed machine learning on public clouds |
title_full_unstemmed |
Distributed machine learning on public clouds |
title_sort |
distributed machine learning on public clouds |
publishDate |
2019 |
url |
http://hdl.handle.net/10356/76892 |
_version_ |
1759858415232876544 |