Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem

With the fast development of Internet-based technologies, data generation has increased drastically over the past few years, coined as big data era. Big data offer a new paradigm shift in data exploration and utilization. The major enabler for underlying many big data platforms is certainly the MapR...

Full description

Saved in:
Bibliographic Details
Main Author: Ibrahim Abaker , Targio Hashem
Format: Thesis
Published: 2017
Subjects:
Online Access:http://studentsrepo.um.edu.my/9755/2/Ibrahim_Abaker_Targio_Hashem.pdf
http://studentsrepo.um.edu.my/9755/1/Ibrahim_Abaker_Targio_Hashem_%E2%80%93_Thesis.pdf
http://studentsrepo.um.edu.my/9755/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaya
id my.um.stud.9755
record_format eprints
spelling my.um.stud.97552019-05-05T22:44:22Z Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem Ibrahim Abaker , Targio Hashem QA75 Electronic computers. Computer science With the fast development of Internet-based technologies, data generation has increased drastically over the past few years, coined as big data era. Big data offer a new paradigm shift in data exploration and utilization. The major enabler for underlying many big data platforms is certainly the MapReduce computational paradigm. Scheduling plays an important role in MapReduce, mainly in reducing the execution time of data-intensive jobs. However, despite recent efforts toward improving MapReduce performance, scheduling MapReduce jobs across multiple nodes have shown to be multi-objective optimization problem. The problem is even more complex using virtualized clusters in a cloud computing to execute a large number of tasks. The complexity lies in achieving multiple objectives that may be of conflicting nature. These conflicting requirements and goals are challenging to optimize due to the difficulty of predicting a new incoming job�s behavior and its completion time. In this study, we aim to optimize task scheduling and resource utilization using an evolutionary algorithm based on the proposed completion time and monetary cost of cloud service models. The multi-objective approaches which are, Sorting Genetic Algorithm II (NSGA-II) and Strength Pareto Evolutionary Algorithm II (SPEA2) are applied to find the Pareto front of the Makespan and total cost. The result of our experiment analysis reveals that the advantage of NSGA-II over the SPEA2 on the tested problems based on the adopted measuring criteria. In addition, NSGA-II algorithm was able to find the optimal solutions. We then proposed a multi-objective scheduling algorithm framework that considers resource allocation and task scheduling in a heterogonous cloud environment. The proposed algorithm is evaluated using tasks scheduling in the scheduling load simulator and validated using statistical modeling. The simulation results acquired from the experiments showed the effectiveness of the proposed framework and algorithm. 2017-03-24 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/9755/2/Ibrahim_Abaker_Targio_Hashem.pdf application/pdf http://studentsrepo.um.edu.my/9755/1/Ibrahim_Abaker_Targio_Hashem_%E2%80%93_Thesis.pdf Ibrahim Abaker , Targio Hashem (2017) Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/9755/
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Ibrahim Abaker , Targio Hashem
Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
description With the fast development of Internet-based technologies, data generation has increased drastically over the past few years, coined as big data era. Big data offer a new paradigm shift in data exploration and utilization. The major enabler for underlying many big data platforms is certainly the MapReduce computational paradigm. Scheduling plays an important role in MapReduce, mainly in reducing the execution time of data-intensive jobs. However, despite recent efforts toward improving MapReduce performance, scheduling MapReduce jobs across multiple nodes have shown to be multi-objective optimization problem. The problem is even more complex using virtualized clusters in a cloud computing to execute a large number of tasks. The complexity lies in achieving multiple objectives that may be of conflicting nature. These conflicting requirements and goals are challenging to optimize due to the difficulty of predicting a new incoming job�s behavior and its completion time. In this study, we aim to optimize task scheduling and resource utilization using an evolutionary algorithm based on the proposed completion time and monetary cost of cloud service models. The multi-objective approaches which are, Sorting Genetic Algorithm II (NSGA-II) and Strength Pareto Evolutionary Algorithm II (SPEA2) are applied to find the Pareto front of the Makespan and total cost. The result of our experiment analysis reveals that the advantage of NSGA-II over the SPEA2 on the tested problems based on the adopted measuring criteria. In addition, NSGA-II algorithm was able to find the optimal solutions. We then proposed a multi-objective scheduling algorithm framework that considers resource allocation and task scheduling in a heterogonous cloud environment. The proposed algorithm is evaluated using tasks scheduling in the scheduling load simulator and validated using statistical modeling. The simulation results acquired from the experiments showed the effectiveness of the proposed framework and algorithm.
format Thesis
author Ibrahim Abaker , Targio Hashem
author_facet Ibrahim Abaker , Targio Hashem
author_sort Ibrahim Abaker , Targio Hashem
title Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
title_short Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
title_full Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
title_fullStr Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
title_full_unstemmed Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
title_sort optimisation model for scheduling mapreduce jobs in big data processing / ibrahim abaker targio hashem
publishDate 2017
url http://studentsrepo.um.edu.my/9755/2/Ibrahim_Abaker_Targio_Hashem.pdf
http://studentsrepo.um.edu.my/9755/1/Ibrahim_Abaker_Targio_Hashem_%E2%80%93_Thesis.pdf
http://studentsrepo.um.edu.my/9755/
_version_ 1738506296473157632