Adaptive Mapreduce task scheduler in heterogeneous environment using dynamic calibration / Lu Xinzhu

MapReduce is a popular programming model for processing large-scale datasets in a distributed environment. Currently, the MapReduce implementation is based on the assumption that every compute node has the same capacity. However, in a heterogeneous environment, such assumptions may hinder the MapRed...

Full description

Saved in:
Bibliographic Details
Main Author: Lu , Xinzhu
Format: Thesis
Published: 2017
Subjects:
Online Access:http://studentsrepo.um.edu.my/14244/1/Lu_Xinzhu.pdf
http://studentsrepo.um.edu.my/14244/2/Lu_Xinzhu.pdf
http://studentsrepo.um.edu.my/14244/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaya
Description
Summary:MapReduce is a popular programming model for processing large-scale datasets in a distributed environment. Currently, the MapReduce implementation is based on the assumption that every compute node has the same capacity. However, in a heterogeneous environment, such assumptions may hinder the MapReduce performance where compute nodes are of varying capacity. Current works showed that make-span could be reduced if workloads are assigned in proportion to the capacity of the heterogeneous compute node. However, these approaches are static in nature where work load is assigned to each compute node based on historical data. This research is an attempt to propose an adaptive MapReduce Task scheduler, namely Adaptive MapReduce Task Scheduler Using Dynamic Calibration (AMTS-DC) to address the unbalanced node capacity problem. The proposed AMTS-DC algorithm uses the heartbeat and data locality to dynamically adapt and balance tasks assigned to each compute node. Based on the heartbeats received during early stage of the job, AMTS-DC is able to estimate the capacity of each compute node. After that, uncomputed local blocks at each compute node are reassigned so that compute nodes with greater capacity are able to reserve more local blocks. Experiment results show that AMTS-DC have relatively better performance when compare to Hadoop FIFO and Dynamic Data Placement Strategy (DDP) in dynamic heterogeneous environment. AMTS-DC has been further enhanced with the introduction of historical data and the enhanced version is named Enhanced Adaptive MapReduce Task Scheduler using Dynamic Calibration (EAMTS-DC). Experimental results show that EAMTS-DC performs better than AMTS-DC.