Replication effect over hadoop mapreduce performance using regression analysis

Hadoop MapReduce is the community accepted platform that deals with the gigantic data in an efficient and cost-effective manner. To cope up with ever growing datasets and shrinking time to analyze them, Hadoop MapReduce leveraged parallelize computations on large distributed clusters consisting of m...

Full description

Saved in:
Bibliographic Details
Main Authors: Shabbir, Aisha, Abu Bakar, Kamalrulnizam, Raja Mohd. Radzi, Raja Zahilah
Format: Article
Published: Foundation of Computer Science (FCS), NY, USA 2018
Subjects:
Online Access:http://eprints.utm.my/id/eprint/82101/
http://dx.doi.org/10.5120/ijca2018918034
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
id my.utm.82101
record_format eprints
spelling my.utm.821012019-10-26T02:44:03Z http://eprints.utm.my/id/eprint/82101/ Replication effect over hadoop mapreduce performance using regression analysis Shabbir, Aisha Abu Bakar, Kamalrulnizam Raja Mohd. Radzi, Raja Zahilah QA75 Electronic computers. Computer science Hadoop MapReduce is the community accepted platform that deals with the gigantic data in an efficient and cost-effective manner. To cope up with ever growing datasets and shrinking time to analyze them, Hadoop MapReduce leveraged parallelize computations on large distributed clusters consisting of many machines. Careful consideration of the factors affecting the Hadoop MapReduce can enhance its performance. Many researches has been done for improving the total job execution time of MapReduce by optimizing different parameters. The replication factor is still unexplored for its effect on the MapReduce job completion time. This paper focuses on the evaluation of data replication factor on MapReduce job completion time using regression analysis. The performance of the Hadoop MapReduce job in terms of total job completion time is monitored experimentally by changing different values of replication. The evaluation results evidently shows the dependence of the job completion time on the replication factor. The dependence of total job completion time on the replication has been verified both analytically and experimentally. Foundation of Computer Science (FCS), NY, USA 2018-10 Article PeerReviewed Shabbir, Aisha and Abu Bakar, Kamalrulnizam and Raja Mohd. Radzi, Raja Zahilah (2018) Replication effect over hadoop mapreduce performance using regression analysis. International Journal Of Computer Applications, 181 (24). ISSN 0975 –8887 http://dx.doi.org/10.5120/ijca2018918034 DOI:10.5120/ijca2018918034
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Shabbir, Aisha
Abu Bakar, Kamalrulnizam
Raja Mohd. Radzi, Raja Zahilah
Replication effect over hadoop mapreduce performance using regression analysis
description Hadoop MapReduce is the community accepted platform that deals with the gigantic data in an efficient and cost-effective manner. To cope up with ever growing datasets and shrinking time to analyze them, Hadoop MapReduce leveraged parallelize computations on large distributed clusters consisting of many machines. Careful consideration of the factors affecting the Hadoop MapReduce can enhance its performance. Many researches has been done for improving the total job execution time of MapReduce by optimizing different parameters. The replication factor is still unexplored for its effect on the MapReduce job completion time. This paper focuses on the evaluation of data replication factor on MapReduce job completion time using regression analysis. The performance of the Hadoop MapReduce job in terms of total job completion time is monitored experimentally by changing different values of replication. The evaluation results evidently shows the dependence of the job completion time on the replication factor. The dependence of total job completion time on the replication has been verified both analytically and experimentally.
format Article
author Shabbir, Aisha
Abu Bakar, Kamalrulnizam
Raja Mohd. Radzi, Raja Zahilah
author_facet Shabbir, Aisha
Abu Bakar, Kamalrulnizam
Raja Mohd. Radzi, Raja Zahilah
author_sort Shabbir, Aisha
title Replication effect over hadoop mapreduce performance using regression analysis
title_short Replication effect over hadoop mapreduce performance using regression analysis
title_full Replication effect over hadoop mapreduce performance using regression analysis
title_fullStr Replication effect over hadoop mapreduce performance using regression analysis
title_full_unstemmed Replication effect over hadoop mapreduce performance using regression analysis
title_sort replication effect over hadoop mapreduce performance using regression analysis
publisher Foundation of Computer Science (FCS), NY, USA
publishDate 2018
url http://eprints.utm.my/id/eprint/82101/
http://dx.doi.org/10.5120/ijca2018918034
_version_ 1651866606919221248