Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem

With the fast development of Internet-based technologies, data generation has increased drastically over the past few years, coined as big data era. Big data offer a new paradigm shift in data exploration and utilization. The major enabler for underlying many big data platforms is certainly the MapR...

Full description

Saved in:

Bibliographic Details
Main Author:	Ibrahim Abaker , Targio Hashem
Format:	Thesis
Published:	2017
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://studentsrepo.um.edu.my/9755/2/Ibrahim_Abaker_Targio_Hashem.pdf http://studentsrepo.um.edu.my/9755/1/Ibrahim_Abaker_Targio_Hashem_%E2%80%93_Thesis.pdf http://studentsrepo.um.edu.my/9755/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Malaya

id	my.um.stud.9755
record_format	eprints
spelling	my.um.stud.97552019-05-05T22:44:22Z Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem Ibrahim Abaker , Targio Hashem QA75 Electronic computers. Computer science With the fast development of Internet-based technologies, data generation has increased drastically over the past few years, coined as big data era. Big data offer a new paradigm shift in data exploration and utilization. The major enabler for underlying many big data platforms is certainly the MapReduce computational paradigm. Scheduling plays an important role in MapReduce, mainly in reducing the execution time of data-intensive jobs. However, despite recent efforts toward improving MapReduce performance, scheduling MapReduce jobs across multiple nodes have shown to be multi-objective optimization problem. The problem is even more complex using virtualized clusters in a cloud computing to execute a large number of tasks. The complexity lies in achieving multiple objectives that may be of conflicting nature. These conflicting requirements and goals are challenging to optimize due to the difficulty of predicting a new incoming jobï¿½s behavior and its completion time. In this study, we aim to optimize task scheduling and resource utilization using an evolutionary algorithm based on the proposed completion time and monetary cost of cloud service models. The multi-objective approaches which are, Sorting Genetic Algorithm II (NSGA-II) and Strength Pareto Evolutionary Algorithm II (SPEA2) are applied to find the Pareto front of the Makespan and total cost. The result of our experiment analysis reveals that the advantage of NSGA-II over the SPEA2 on the tested problems based on the adopted measuring criteria. In addition, NSGA-II algorithm was able to find the optimal solutions. We then proposed a multi-objective scheduling algorithm framework that considers resource allocation and task scheduling in a heterogonous cloud environment. The proposed algorithm is evaluated using tasks scheduling in the scheduling load simulator and validated using statistical modeling. The simulation results acquired from the experiments showed the effectiveness of the proposed framework and algorithm. 2017-03-24 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/9755/2/Ibrahim_Abaker_Targio_Hashem.pdf application/pdf http://studentsrepo.um.edu.my/9755/1/Ibrahim_Abaker_Targio_Hashem_%E2%80%93_Thesis.pdf Ibrahim Abaker , Targio Hashem (2017) Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/9755/
institution	Universiti Malaya
building	UM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaya
content_source	UM Student Repository
url_provider	http://studentsrepo.um.edu.my/
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Ibrahim Abaker , Targio Hashem Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
description	With the fast development of Internet-based technologies, data generation has increased drastically over the past few years, coined as big data era. Big data offer a new paradigm shift in data exploration and utilization. The major enabler for underlying many big data platforms is certainly the MapReduce computational paradigm. Scheduling plays an important role in MapReduce, mainly in reducing the execution time of data-intensive jobs. However, despite recent efforts toward improving MapReduce performance, scheduling MapReduce jobs across multiple nodes have shown to be multi-objective optimization problem. The problem is even more complex using virtualized clusters in a cloud computing to execute a large number of tasks. The complexity lies in achieving multiple objectives that may be of conflicting nature. These conflicting requirements and goals are challenging to optimize due to the difficulty of predicting a new incoming jobï¿½s behavior and its completion time. In this study, we aim to optimize task scheduling and resource utilization using an evolutionary algorithm based on the proposed completion time and monetary cost of cloud service models. The multi-objective approaches which are, Sorting Genetic Algorithm II (NSGA-II) and Strength Pareto Evolutionary Algorithm II (SPEA2) are applied to find the Pareto front of the Makespan and total cost. The result of our experiment analysis reveals that the advantage of NSGA-II over the SPEA2 on the tested problems based on the adopted measuring criteria. In addition, NSGA-II algorithm was able to find the optimal solutions. We then proposed a multi-objective scheduling algorithm framework that considers resource allocation and task scheduling in a heterogonous cloud environment. The proposed algorithm is evaluated using tasks scheduling in the scheduling load simulator and validated using statistical modeling. The simulation results acquired from the experiments showed the effectiveness of the proposed framework and algorithm.
format	Thesis
author	Ibrahim Abaker , Targio Hashem
author_facet	Ibrahim Abaker , Targio Hashem
author_sort	Ibrahim Abaker , Targio Hashem
title	Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
title_short	Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
title_full	Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
title_fullStr	Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
title_full_unstemmed	Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem
title_sort	optimisation model for scheduling mapreduce jobs in big data processing / ibrahim abaker targio hashem
publishDate	2017
url	http://studentsrepo.um.edu.my/9755/2/Ibrahim_Abaker_Targio_Hashem.pdf http://studentsrepo.um.edu.my/9755/1/Ibrahim_Abaker_Targio_Hashem_%E2%80%93_Thesis.pdf http://studentsrepo.um.edu.my/9755/
_version_	1738506296473157632

Optimisation model for scheduling MapReduce jobs in big data processing / Ibrahim Abaker Targio Hashem

Similar Items