Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
Many cluster management systems (CMSs) have been proposed to share a single cluster with multiple distributed computing systems. However, none of the existing approaches can handle distributed machine learning (ML) workloads given the following criteria: high resource utilization, fair resource allo...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2017
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/4766 https://ink.library.smu.edu.sg/context/sis_research/article/5769/viewcontent/1704.06738.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-5769 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-57692020-01-16T10:27:06Z Towards distributed machine learning in shared clusters: A dynamically-partitioned approach SUN, Peng WEN, Yonggang TA, Nguyen Binh Duong YAN, Shengen Many cluster management systems (CMSs) have been proposed to share a single cluster with multiple distributed computing systems. However, none of the existing approaches can handle distributed machine learning (ML) workloads given the following criteria: high resource utilization, fair resource allocation and low sharing overhead. To solve this problem, we propose a new CMS named Dorm, incorporating a dynamicallypartitioned cluster management mechanism and an utilizationfairness optimizer. Specifically, Dorm uses the container-based virtualization technique to partition a cluster, runs one application per partition, and can dynamically resize each partition at application runtime for resource efficiency and fairness. Each application directly launches its tasks on the assigned partition without petitioning for resources frequently, so Dorm imposes flat sharing overhead. Extensive performance evaluations showed that Dorm could simultaneously increase the resource utilization by a factor of up to 2.32, reduce the fairness loss by a factor of up to 1.52, and speed up popular distributed ML applications by a factor of up to 2.72, compared to existing approaches. Dorm’s sharing overhead is less than 5% in most cases. Index Terms—Cluster Resource Management, Distributed Machine Learning, Fairness 2017-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4766 info:doi/10.1109/SMARTCOMP.2017.7947053 https://ink.library.smu.edu.sg/context/sis_research/article/5769/viewcontent/1704.06738.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Artificial Intelligence and Robotics Software Engineering |
spellingShingle |
Artificial Intelligence and Robotics Software Engineering SUN, Peng WEN, Yonggang TA, Nguyen Binh Duong YAN, Shengen Towards distributed machine learning in shared clusters: A dynamically-partitioned approach |
description |
Many cluster management systems (CMSs) have been proposed to share a single cluster with multiple distributed computing systems. However, none of the existing approaches can handle distributed machine learning (ML) workloads given the following criteria: high resource utilization, fair resource allocation and low sharing overhead. To solve this problem, we propose a new CMS named Dorm, incorporating a dynamicallypartitioned cluster management mechanism and an utilizationfairness optimizer. Specifically, Dorm uses the container-based virtualization technique to partition a cluster, runs one application per partition, and can dynamically resize each partition at application runtime for resource efficiency and fairness. Each application directly launches its tasks on the assigned partition without petitioning for resources frequently, so Dorm imposes flat sharing overhead. Extensive performance evaluations showed that Dorm could simultaneously increase the resource utilization by a factor of up to 2.32, reduce the fairness loss by a factor of up to 1.52, and speed up popular distributed ML applications by a factor of up to 2.72, compared to existing approaches. Dorm’s sharing overhead is less than 5% in most cases. Index Terms—Cluster Resource Management, Distributed Machine Learning, Fairness |
format |
text |
author |
SUN, Peng WEN, Yonggang TA, Nguyen Binh Duong YAN, Shengen |
author_facet |
SUN, Peng WEN, Yonggang TA, Nguyen Binh Duong YAN, Shengen |
author_sort |
SUN, Peng |
title |
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach |
title_short |
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach |
title_full |
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach |
title_fullStr |
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach |
title_full_unstemmed |
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach |
title_sort |
towards distributed machine learning in shared clusters: a dynamically-partitioned approach |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2017 |
url |
https://ink.library.smu.edu.sg/sis_research/4766 https://ink.library.smu.edu.sg/context/sis_research/article/5769/viewcontent/1704.06738.pdf |
_version_ |
1770575025282220032 |