Towards distributed machine learning in shared clusters: A dynamically-partitioned approach

Many cluster management systems (CMSs) have been proposed to share a single cluster with multiple distributed computing systems. However, none of the existing approaches can handle distributed machine learning (ML) workloads given the following criteria: high resource utilization, fair resource allo...

Full description

Saved in:
Bibliographic Details
Main Authors: SUN, Peng, WEN, Yonggang, TA, Nguyen Binh Duong, YAN, Shengen
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4766
https://ink.library.smu.edu.sg/context/sis_research/article/5769/viewcontent/1704.06738.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5769
record_format dspace
spelling sg-smu-ink.sis_research-57692020-01-16T10:27:06Z Towards distributed machine learning in shared clusters: A dynamically-partitioned approach SUN, Peng WEN, Yonggang TA, Nguyen Binh Duong YAN, Shengen Many cluster management systems (CMSs) have been proposed to share a single cluster with multiple distributed computing systems. However, none of the existing approaches can handle distributed machine learning (ML) workloads given the following criteria: high resource utilization, fair resource allocation and low sharing overhead. To solve this problem, we propose a new CMS named Dorm, incorporating a dynamicallypartitioned cluster management mechanism and an utilizationfairness optimizer. Specifically, Dorm uses the container-based virtualization technique to partition a cluster, runs one application per partition, and can dynamically resize each partition at application runtime for resource efficiency and fairness. Each application directly launches its tasks on the assigned partition without petitioning for resources frequently, so Dorm imposes flat sharing overhead. Extensive performance evaluations showed that Dorm could simultaneously increase the resource utilization by a factor of up to 2.32, reduce the fairness loss by a factor of up to 1.52, and speed up popular distributed ML applications by a factor of up to 2.72, compared to existing approaches. Dorm’s sharing overhead is less than 5% in most cases. Index Terms—Cluster Resource Management, Distributed Machine Learning, Fairness 2017-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4766 info:doi/10.1109/SMARTCOMP.2017.7947053 https://ink.library.smu.edu.sg/context/sis_research/article/5769/viewcontent/1704.06738.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Artificial Intelligence and Robotics
Software Engineering
spellingShingle Artificial Intelligence and Robotics
Software Engineering
SUN, Peng
WEN, Yonggang
TA, Nguyen Binh Duong
YAN, Shengen
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
description Many cluster management systems (CMSs) have been proposed to share a single cluster with multiple distributed computing systems. However, none of the existing approaches can handle distributed machine learning (ML) workloads given the following criteria: high resource utilization, fair resource allocation and low sharing overhead. To solve this problem, we propose a new CMS named Dorm, incorporating a dynamicallypartitioned cluster management mechanism and an utilizationfairness optimizer. Specifically, Dorm uses the container-based virtualization technique to partition a cluster, runs one application per partition, and can dynamically resize each partition at application runtime for resource efficiency and fairness. Each application directly launches its tasks on the assigned partition without petitioning for resources frequently, so Dorm imposes flat sharing overhead. Extensive performance evaluations showed that Dorm could simultaneously increase the resource utilization by a factor of up to 2.32, reduce the fairness loss by a factor of up to 1.52, and speed up popular distributed ML applications by a factor of up to 2.72, compared to existing approaches. Dorm’s sharing overhead is less than 5% in most cases. Index Terms—Cluster Resource Management, Distributed Machine Learning, Fairness
format text
author SUN, Peng
WEN, Yonggang
TA, Nguyen Binh Duong
YAN, Shengen
author_facet SUN, Peng
WEN, Yonggang
TA, Nguyen Binh Duong
YAN, Shengen
author_sort SUN, Peng
title Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
title_short Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
title_full Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
title_fullStr Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
title_full_unstemmed Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
title_sort towards distributed machine learning in shared clusters: a dynamically-partitioned approach
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/4766
https://ink.library.smu.edu.sg/context/sis_research/article/5769/viewcontent/1704.06738.pdf
_version_ 1770575025282220032