Dynamic Job Ordering and Slot Configurations for MapReduce Workloads

MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. A MapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reduce tasks. Due to 1) that map tasks can only run in map slots a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tang, Shanjiang, Lee, Bu-Sung, He, Bingsheng
Other Authors:	School of Computer Engineering
Format:	Article
Language:	English
Published:	2016
Subjects:	flow-shops scheduling algorithm job ordering MapReduce Hadoop
Online Access:	https://hdl.handle.net/10356/80385 http://hdl.handle.net/10220/40666
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-80385
record_format	dspace
spelling	sg-ntu-dr.10356-803852020-05-28T07:17:33Z Dynamic Job Ordering and Slot Configurations for MapReduce Workloads Tang, Shanjiang Lee, Bu-Sung He, Bingsheng School of Computer Engineering flow-shops scheduling algorithm job ordering MapReduce Hadoop MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. A MapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reduce tasks. Due to 1) that map tasks can only run in map slots and reduce tasks can only run in reduce slots, and 2) the general execution constraints that map tasks are executed before reduce tasks, different job execution orders and map/reduce slot configurations for a MapReduce workload have significantly different performance and system utilization. This paper proposes two classes of algorithms to minimize the makespan and the total completion time for an offline MapReduce workload. Our first class of algorithms focuses on the job ordering optimization for a MapReduce workload under a given map/reduce slot configuration. In contrast, our second class of algorithms considers the scenario that we can perform optimization for map/reduce slot configuration for a MapReduce workload. We perform simulations as well as experiments on Amazon EC2 and show that our proposed algorithms produce results that are up to 15 ~ 80 percent better than currently unoptimized Hadoop, leading to significant reductions in running time in practice. 2016-06-13T06:29:20Z 2019-12-06T13:48:22Z 2016-06-13T06:29:20Z 2019-12-06T13:48:22Z 2016 2016 Journal Article Tang, S., Lee, B.-S., & He, B. (2016). Dynamic Job Ordering and Slot Configurations for MapReduce Workloads. IEEE Transactions on Services Computing, 9(1), 4-17. 1939-1374 https://hdl.handle.net/10356/80385 http://hdl.handle.net/10220/40666 10.1109/TSC.2015.2426186 187084 en IEEE Transactions on Services Computing © 2016 IEEE.
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	flow-shops scheduling algorithm job ordering MapReduce Hadoop
spellingShingle	flow-shops scheduling algorithm job ordering MapReduce Hadoop Tang, Shanjiang Lee, Bu-Sung He, Bingsheng Dynamic Job Ordering and Slot Configurations for MapReduce Workloads
description	MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. A MapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reduce tasks. Due to 1) that map tasks can only run in map slots and reduce tasks can only run in reduce slots, and 2) the general execution constraints that map tasks are executed before reduce tasks, different job execution orders and map/reduce slot configurations for a MapReduce workload have significantly different performance and system utilization. This paper proposes two classes of algorithms to minimize the makespan and the total completion time for an offline MapReduce workload. Our first class of algorithms focuses on the job ordering optimization for a MapReduce workload under a given map/reduce slot configuration. In contrast, our second class of algorithms considers the scenario that we can perform optimization for map/reduce slot configuration for a MapReduce workload. We perform simulations as well as experiments on Amazon EC2 and show that our proposed algorithms produce results that are up to 15 ~ 80 percent better than currently unoptimized Hadoop, leading to significant reductions in running time in practice.
author2	School of Computer Engineering
author_facet	School of Computer Engineering Tang, Shanjiang Lee, Bu-Sung He, Bingsheng
format	Article
author	Tang, Shanjiang Lee, Bu-Sung He, Bingsheng
author_sort	Tang, Shanjiang
title	Dynamic Job Ordering and Slot Configurations for MapReduce Workloads
title_short	Dynamic Job Ordering and Slot Configurations for MapReduce Workloads
title_full	Dynamic Job Ordering and Slot Configurations for MapReduce Workloads
title_fullStr	Dynamic Job Ordering and Slot Configurations for MapReduce Workloads
title_full_unstemmed	Dynamic Job Ordering and Slot Configurations for MapReduce Workloads
title_sort	dynamic job ordering and slot configurations for mapreduce workloads
publishDate	2016
url	https://hdl.handle.net/10356/80385 http://hdl.handle.net/10220/40666
_version_	1681057665669660672

Dynamic Job Ordering and Slot Configurations for MapReduce Workloads

Similar Items