Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
With the fast development of Internet-based technologies, data generation has increased drastically over the past few years, coined as big data era. Big data offer a new paradigm shift in data exploration and utilization. The major enabler for underlying many big data platforms is certainly the MapR...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2017
|
Subjects: | |
Online Access: | http://studentsrepo.um.edu.my/9755/2/Ibrahim_Abaker_Targio_Hashem.pdf http://studentsrepo.um.edu.my/9755/1/Ibrahim_Abaker_Targio_Hashem_%E2%80%93_Thesis.pdf http://studentsrepo.um.edu.my/9755/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaya |
id |
my.um.stud.9755 |
---|---|
record_format |
eprints |
spelling |
my.um.stud.97552019-05-05T22:44:22Z Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem Ibrahim Abaker , Targio Hashem QA75 Electronic computers. Computer science With the fast development of Internet-based technologies, data generation has increased drastically over the past few years, coined as big data era. Big data offer a new paradigm shift in data exploration and utilization. The major enabler for underlying many big data platforms is certainly the MapReduce computational paradigm. Scheduling plays an important role in MapReduce, mainly in reducing the execution time of data-intensive jobs. However, despite recent efforts toward improving MapReduce performance, scheduling MapReduce jobs across multiple nodes have shown to be multi-objective optimization problem. The problem is even more complex using virtualized clusters in a cloud computing to execute a large number of tasks. The complexity lies in achieving multiple objectives that may be of conflicting nature. These conflicting requirements and goals are challenging to optimize due to the difficulty of predicting a new incoming job�s behavior and its completion time. In this study, we aim to optimize task scheduling and resource utilization using an evolutionary algorithm based on the proposed completion time and monetary cost of cloud service models. The multi-objective approaches which are, Sorting Genetic Algorithm II (NSGA-II) and Strength Pareto Evolutionary Algorithm II (SPEA2) are applied to find the Pareto front of the Makespan and total cost. The result of our experiment analysis reveals that the advantage of NSGA-II over the SPEA2 on the tested problems based on the adopted measuring criteria. In addition, NSGA-II algorithm was able to find the optimal solutions. We then proposed a multi-objective scheduling algorithm framework that considers resource allocation and task scheduling in a heterogonous cloud environment. The proposed algorithm is evaluated using tasks scheduling in the scheduling load simulator and validated using statistical modeling. The simulation results acquired from the experiments showed the effectiveness of the proposed framework and algorithm. 2017-03-24 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/9755/2/Ibrahim_Abaker_Targio_Hashem.pdf application/pdf http://studentsrepo.um.edu.my/9755/1/Ibrahim_Abaker_Targio_Hashem_%E2%80%93_Thesis.pdf Ibrahim Abaker , Targio Hashem (2017) Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/9755/ |
institution |
Universiti Malaya |
building |
UM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaya |
content_source |
UM Student Repository |
url_provider |
http://studentsrepo.um.edu.my/ |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Ibrahim Abaker , Targio Hashem Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem |
description |
With the fast development of Internet-based technologies, data generation has increased drastically over the past few years, coined as big data era. Big data offer a new paradigm shift in data exploration and utilization. The major enabler for underlying many big data platforms is certainly the MapReduce computational paradigm. Scheduling plays an important role in MapReduce, mainly in reducing the execution time of data-intensive jobs. However, despite recent efforts toward improving MapReduce performance, scheduling MapReduce jobs across multiple nodes have shown to be multi-objective optimization problem. The problem is even more complex using virtualized clusters in a cloud computing to execute a large number of tasks. The complexity lies in achieving multiple objectives that may be of conflicting nature. These conflicting requirements and goals are challenging to optimize due to the difficulty of predicting a new incoming job�s behavior and its completion time. In this study, we aim to optimize task scheduling and resource utilization using an evolutionary algorithm based on the proposed completion time and monetary cost of cloud service models. The multi-objective approaches which are, Sorting Genetic Algorithm II (NSGA-II) and Strength Pareto Evolutionary Algorithm II (SPEA2) are applied to find the Pareto front of the Makespan and total cost. The result of our experiment analysis reveals that the advantage of NSGA-II over the SPEA2 on the tested problems based on the adopted measuring criteria. In addition, NSGA-II algorithm was able to find the optimal solutions. We then proposed a multi-objective scheduling algorithm framework that considers resource allocation and task scheduling in a heterogonous cloud environment. The proposed algorithm is evaluated using tasks scheduling in the scheduling load simulator and validated using statistical modeling. The simulation results acquired from the experiments showed the effectiveness of the proposed framework and algorithm. |
format |
Thesis |
author |
Ibrahim Abaker , Targio Hashem |
author_facet |
Ibrahim Abaker , Targio Hashem |
author_sort |
Ibrahim Abaker , Targio Hashem |
title |
Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem |
title_short |
Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem |
title_full |
Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem |
title_fullStr |
Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem |
title_full_unstemmed |
Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem |
title_sort |
optimisation model for scheduling mapreduce jobs in big data processing / ibrahim abaker targio hashem |
publishDate |
2017 |
url |
http://studentsrepo.um.edu.my/9755/2/Ibrahim_Abaker_Targio_Hashem.pdf http://studentsrepo.um.edu.my/9755/1/Ibrahim_Abaker_Targio_Hashem_%E2%80%93_Thesis.pdf http://studentsrepo.um.edu.my/9755/ |
_version_ |
1738506296473157632 |