Optimization techniques on job scheduling and resource allocation for MapReduce system
MapReduce has become a popular high performance computing paradigm for large-scale data processing. Hadoop, an open source implementation of MapReduce, has been widely deployed in large clusters containing thousands of machines by companies such as Yahoo! and Facebook to support batch processing for...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/64312 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-64312 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-643122023-03-04T00:50:03Z Optimization techniques on job scheduling and resource allocation for MapReduce system Tang, Shanjiang Lee Bu Sung, Francis School of Computer Engineering Parallel and Distributed Computing Centre DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems MapReduce has become a popular high performance computing paradigm for large-scale data processing. Hadoop, an open source implementation of MapReduce, has been widely deployed in large clusters containing thousands of machines by companies such as Yahoo! and Facebook to support batch processing for large jobs submitted from multiple users (i.e., MapReduce workloads). However, there are certainly a lot of room to improve the performance and fairness of Hadoop. In this thesis, we focus on optimization techniques on job scheduling and resource allocation to improve the performance and fairness of Hadoop system. First, we focus on the performance optimization for MapReduce workloads under FIFO scheduler without changing the source code of Hadoop by using job re-ordering approach. We consider two different kinds of production workloads, i.e., offline MapReduce workloads and online MapReduce workloads. The performance metrics used are makespan and total completion time. For offline workloads, we propose several job ordering algorithms. Based on the offline approaches, we further propose a prototype system called MROrder to optimize the performance for online MapReduce workloads. The experimental results show that our job ordering methods can significantly improve the performance of Hadoop for both offline and online workloads. Second, instead of keeping the default static MapReduce resource allocation model where the number of map slots and reduce slots are pre-configured and not fungible, we relax the model constrain to allow slots to be reallocated to either map or reduce tasks depending on their needs through modifying the source code of Hadoop. A dynamic fair resource allocation and scheduling system called DynamicMR is proposed and implemented in Hadoop. It can improve the performance of MapReduce workloads while ensuring the fairness without any information about MapReduce jobs. The experimental results validate the effectiveness of our DynamicMR. Moreover, it can also be applied for FIFO scheduler. Finally, besides the optimization for Hadoop MRv1, we also optimize the fair resource allocation for YARN (i.e., Hadoop MRv2). Specifically, we consider pay-as-you-use computing (e.g., cloud computing) and find that the traditional fair policy is not suitable for such computing system. To address this, we propose a Long-Term Resource Fairness (LTRF) and implement it in YARN by developing LTYARN, a long-term YARN fair scheduler. Our experimental results show that it leads to better resource fairness than existing fair scheduler. Thus, in this thesis, we have addressed some of the optimization problems on job scheduling and resource allocation for MapReduce system under different scenarios. We have proposed new algorithms and frameworks to improve the performance and fairness for Hadoop system. The proposed algorithms and frameworks will be options for users who want to optimize the performance of their MapReduce workloads or ensure fairness, according to their needs and conditions. DOCTOR OF PHILOSOPHY (SCE) 2015-05-26T02:05:15Z 2015-05-26T02:05:15Z 2015 2015 Thesis Tang, S. (2015). Optimization techniques on job scheduling and resource allocation for MapReduce system. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/64312 10.32657/10356/64312 en 157 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Tang, Shanjiang Optimization techniques on job scheduling and resource allocation for MapReduce system |
description |
MapReduce has become a popular high performance computing paradigm for large-scale data processing. Hadoop, an open source implementation of MapReduce, has been widely deployed in large clusters containing thousands of machines by companies such as Yahoo! and Facebook to support batch processing for large jobs submitted from multiple users (i.e., MapReduce workloads). However, there are certainly a lot of room to improve the performance and fairness of Hadoop. In this thesis, we focus on optimization techniques on job scheduling and resource allocation to improve the performance and fairness of Hadoop system. First, we focus on the performance optimization for MapReduce workloads under FIFO scheduler without changing the source code of Hadoop by using job re-ordering approach. We consider two different kinds of production workloads, i.e., offline MapReduce workloads and online MapReduce workloads. The performance metrics used are makespan and total completion time. For offline workloads, we propose several job ordering algorithms. Based on the offline approaches, we further propose a prototype system called MROrder to optimize the performance for online MapReduce workloads. The experimental results show that our job ordering methods can significantly improve the performance of Hadoop for both offline and online workloads. Second, instead of keeping the default static MapReduce resource allocation model where the number of map slots and reduce slots are pre-configured and not fungible, we relax the model constrain to allow slots to be reallocated to either map or reduce tasks depending on their needs through modifying the source code of Hadoop. A dynamic fair resource allocation and scheduling system called DynamicMR is proposed and implemented in Hadoop. It can improve the performance of MapReduce workloads while ensuring the fairness without any information about MapReduce jobs. The experimental results validate the effectiveness of our DynamicMR. Moreover, it can also be applied for FIFO scheduler. Finally, besides the optimization for Hadoop MRv1, we also optimize the fair resource allocation for YARN (i.e., Hadoop MRv2). Specifically, we consider pay-as-you-use computing (e.g., cloud computing) and find that the traditional fair policy is not suitable for such computing system. To address this, we propose a Long-Term Resource Fairness (LTRF) and implement it in YARN by developing LTYARN, a long-term YARN fair scheduler. Our experimental results show that it leads to better resource fairness than existing fair scheduler. Thus, in this thesis, we have addressed some of the optimization problems on job scheduling and resource allocation for MapReduce system under different scenarios. We have proposed new algorithms and frameworks to improve the performance and fairness for Hadoop system. The proposed algorithms and frameworks will be options for users who want to optimize the performance of their MapReduce workloads or ensure fairness, according to their needs and conditions. |
author2 |
Lee Bu Sung, Francis |
author_facet |
Lee Bu Sung, Francis Tang, Shanjiang |
format |
Theses and Dissertations |
author |
Tang, Shanjiang |
author_sort |
Tang, Shanjiang |
title |
Optimization techniques on job scheduling and resource allocation for MapReduce system |
title_short |
Optimization techniques on job scheduling and resource allocation for MapReduce system |
title_full |
Optimization techniques on job scheduling and resource allocation for MapReduce system |
title_fullStr |
Optimization techniques on job scheduling and resource allocation for MapReduce system |
title_full_unstemmed |
Optimization techniques on job scheduling and resource allocation for MapReduce system |
title_sort |
optimization techniques on job scheduling and resource allocation for mapreduce system |
publishDate |
2015 |
url |
https://hdl.handle.net/10356/64312 |
_version_ |
1759856375823859712 |