Adaptive Mapreduce task scheduler in heterogeneous environment using dynamic calibration / Lu Xinzhu

MapReduce is a popular programming model for processing large-scale datasets in a distributed environment. Currently, the MapReduce implementation is based on the assumption that every compute node has the same capacity. However, in a heterogeneous environment, such assumptions may hinder the MapRed...

Full description

Saved in:
Bibliographic Details
Main Author: Lu , Xinzhu
Format: Thesis
Published: 2017
Subjects:
Online Access:http://studentsrepo.um.edu.my/14244/1/Lu_Xinzhu.pdf
http://studentsrepo.um.edu.my/14244/2/Lu_Xinzhu.pdf
http://studentsrepo.um.edu.my/14244/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaya
id my.um.stud.14244
record_format eprints
spelling my.um.stud.142442023-04-11T20:21:38Z Adaptive Mapreduce task scheduler in heterogeneous environment using dynamic calibration / Lu Xinzhu Lu , Xinzhu QA75 Electronic computers. Computer science MapReduce is a popular programming model for processing large-scale datasets in a distributed environment. Currently, the MapReduce implementation is based on the assumption that every compute node has the same capacity. However, in a heterogeneous environment, such assumptions may hinder the MapReduce performance where compute nodes are of varying capacity. Current works showed that make-span could be reduced if workloads are assigned in proportion to the capacity of the heterogeneous compute node. However, these approaches are static in nature where work load is assigned to each compute node based on historical data. This research is an attempt to propose an adaptive MapReduce Task scheduler, namely Adaptive MapReduce Task Scheduler Using Dynamic Calibration (AMTS-DC) to address the unbalanced node capacity problem. The proposed AMTS-DC algorithm uses the heartbeat and data locality to dynamically adapt and balance tasks assigned to each compute node. Based on the heartbeats received during early stage of the job, AMTS-DC is able to estimate the capacity of each compute node. After that, uncomputed local blocks at each compute node are reassigned so that compute nodes with greater capacity are able to reserve more local blocks. Experiment results show that AMTS-DC have relatively better performance when compare to Hadoop FIFO and Dynamic Data Placement Strategy (DDP) in dynamic heterogeneous environment. AMTS-DC has been further enhanced with the introduction of historical data and the enhanced version is named Enhanced Adaptive MapReduce Task Scheduler using Dynamic Calibration (EAMTS-DC). Experimental results show that EAMTS-DC performs better than AMTS-DC. 2017-11 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/14244/1/Lu_Xinzhu.pdf application/pdf http://studentsrepo.um.edu.my/14244/2/Lu_Xinzhu.pdf Lu , Xinzhu (2017) Adaptive Mapreduce task scheduler in heterogeneous environment using dynamic calibration / Lu Xinzhu. Masters thesis, Universiti Malaya. http://studentsrepo.um.edu.my/14244/
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Lu , Xinzhu
Adaptive Mapreduce task scheduler in heterogeneous environment using dynamic calibration / Lu Xinzhu
description MapReduce is a popular programming model for processing large-scale datasets in a distributed environment. Currently, the MapReduce implementation is based on the assumption that every compute node has the same capacity. However, in a heterogeneous environment, such assumptions may hinder the MapReduce performance where compute nodes are of varying capacity. Current works showed that make-span could be reduced if workloads are assigned in proportion to the capacity of the heterogeneous compute node. However, these approaches are static in nature where work load is assigned to each compute node based on historical data. This research is an attempt to propose an adaptive MapReduce Task scheduler, namely Adaptive MapReduce Task Scheduler Using Dynamic Calibration (AMTS-DC) to address the unbalanced node capacity problem. The proposed AMTS-DC algorithm uses the heartbeat and data locality to dynamically adapt and balance tasks assigned to each compute node. Based on the heartbeats received during early stage of the job, AMTS-DC is able to estimate the capacity of each compute node. After that, uncomputed local blocks at each compute node are reassigned so that compute nodes with greater capacity are able to reserve more local blocks. Experiment results show that AMTS-DC have relatively better performance when compare to Hadoop FIFO and Dynamic Data Placement Strategy (DDP) in dynamic heterogeneous environment. AMTS-DC has been further enhanced with the introduction of historical data and the enhanced version is named Enhanced Adaptive MapReduce Task Scheduler using Dynamic Calibration (EAMTS-DC). Experimental results show that EAMTS-DC performs better than AMTS-DC.
format Thesis
author Lu , Xinzhu
author_facet Lu , Xinzhu
author_sort Lu , Xinzhu
title Adaptive Mapreduce task scheduler in heterogeneous environment using dynamic calibration / Lu Xinzhu
title_short Adaptive Mapreduce task scheduler in heterogeneous environment using dynamic calibration / Lu Xinzhu
title_full Adaptive Mapreduce task scheduler in heterogeneous environment using dynamic calibration / Lu Xinzhu
title_fullStr Adaptive Mapreduce task scheduler in heterogeneous environment using dynamic calibration / Lu Xinzhu
title_full_unstemmed Adaptive Mapreduce task scheduler in heterogeneous environment using dynamic calibration / Lu Xinzhu
title_sort adaptive mapreduce task scheduler in heterogeneous environment using dynamic calibration / lu xinzhu
publishDate 2017
url http://studentsrepo.um.edu.my/14244/1/Lu_Xinzhu.pdf
http://studentsrepo.um.edu.my/14244/2/Lu_Xinzhu.pdf
http://studentsrepo.um.edu.my/14244/
_version_ 1764223126750625792