Big data: Performance profiling of Meteorological and Oceanographic data on Hive

The emergence and development of big data tools, techniques and systems motivate industries and organizations to embrace and explore research in big data. This is to circumvent the challenges of the traditional database systems. However, the available benchmarks and workloads are for some specific a...

Full description

Saved in:
Bibliographic Details
Main Authors: Abdullahi, A.U., Ahmad, R., Zakaria, N.M.
Format: Conference or Workshop Item
Published: Institute of Electrical and Electronics Engineers Inc. 2016
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85010433068&doi=10.1109%2fICCOINS.2016.7783215&partnerID=40&md5=c44afcfa573af47fe130803c36767564
http://eprints.utp.edu.my/30482/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Petronas
id my.utp.eprints.30482
record_format eprints
spelling my.utp.eprints.304822022-03-25T06:55:41Z Big data: Performance profiling of Meteorological and Oceanographic data on Hive Abdullahi, A.U. Ahmad, R. Zakaria, N.M. The emergence and development of big data tools, techniques and systems motivate industries and organizations to embrace and explore research in big data. This is to circumvent the challenges of the traditional database systems. However, the available benchmarks and workloads are for some specific aspect of Information Technology industries, which has differences in nature and complexities with the data obtained from other sources. Hence there is need for using data from other domains in order to evaluate the performance and maturity of the big data technologies. In this paper the performance profiling of Meteorological and Oceanographic data on Hive is conducted. Hive being the commonly used data warehouse analytical platform for big data is chosen with the view to exposing the intricacies that are involved in the formating and loading of the data. The response time for indexed and non-indexed retrievals using three set of queries frequently used in the area is found. The query types are Type 1 SELECT with WHERE clause, Type 2 SELECT with JOIN clause. And Type 3 SELECT with GROUP BY clause. The experimental results show that a good response time for both indexed and Non-indexed tables are achieved. The indexed retrieval shows a significant decrease in the response time for Type 1 query for all data sizes and for Type 3 query for 100GB data size and less. It also shows additional overhead for Type 2 query for all data sizes and Type 3 query for 500GB and more data sizes. The Meteorological and Oceanographic data if properly formated it's analytics with Hive proved to be efficient compared to the traditional database systems. The results of this study has the potentials of attracting the oil and gas companies to adopt big data technologies for the handling of their exploration dataset. © 2016 IEEE. Institute of Electrical and Electronics Engineers Inc. 2016 Conference or Workshop Item NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85010433068&doi=10.1109%2fICCOINS.2016.7783215&partnerID=40&md5=c44afcfa573af47fe130803c36767564 Abdullahi, A.U. and Ahmad, R. and Zakaria, N.M. (2016) Big data: Performance profiling of Meteorological and Oceanographic data on Hive. In: UNSPECIFIED. http://eprints.utp.edu.my/30482/
institution Universiti Teknologi Petronas
building UTP Resource Centre
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Petronas
content_source UTP Institutional Repository
url_provider http://eprints.utp.edu.my/
description The emergence and development of big data tools, techniques and systems motivate industries and organizations to embrace and explore research in big data. This is to circumvent the challenges of the traditional database systems. However, the available benchmarks and workloads are for some specific aspect of Information Technology industries, which has differences in nature and complexities with the data obtained from other sources. Hence there is need for using data from other domains in order to evaluate the performance and maturity of the big data technologies. In this paper the performance profiling of Meteorological and Oceanographic data on Hive is conducted. Hive being the commonly used data warehouse analytical platform for big data is chosen with the view to exposing the intricacies that are involved in the formating and loading of the data. The response time for indexed and non-indexed retrievals using three set of queries frequently used in the area is found. The query types are Type 1 SELECT with WHERE clause, Type 2 SELECT with JOIN clause. And Type 3 SELECT with GROUP BY clause. The experimental results show that a good response time for both indexed and Non-indexed tables are achieved. The indexed retrieval shows a significant decrease in the response time for Type 1 query for all data sizes and for Type 3 query for 100GB data size and less. It also shows additional overhead for Type 2 query for all data sizes and Type 3 query for 500GB and more data sizes. The Meteorological and Oceanographic data if properly formated it's analytics with Hive proved to be efficient compared to the traditional database systems. The results of this study has the potentials of attracting the oil and gas companies to adopt big data technologies for the handling of their exploration dataset. © 2016 IEEE.
format Conference or Workshop Item
author Abdullahi, A.U.
Ahmad, R.
Zakaria, N.M.
spellingShingle Abdullahi, A.U.
Ahmad, R.
Zakaria, N.M.
Big data: Performance profiling of Meteorological and Oceanographic data on Hive
author_facet Abdullahi, A.U.
Ahmad, R.
Zakaria, N.M.
author_sort Abdullahi, A.U.
title Big data: Performance profiling of Meteorological and Oceanographic data on Hive
title_short Big data: Performance profiling of Meteorological and Oceanographic data on Hive
title_full Big data: Performance profiling of Meteorological and Oceanographic data on Hive
title_fullStr Big data: Performance profiling of Meteorological and Oceanographic data on Hive
title_full_unstemmed Big data: Performance profiling of Meteorological and Oceanographic data on Hive
title_sort big data: performance profiling of meteorological and oceanographic data on hive
publisher Institute of Electrical and Electronics Engineers Inc.
publishDate 2016
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85010433068&doi=10.1109%2fICCOINS.2016.7783215&partnerID=40&md5=c44afcfa573af47fe130803c36767564
http://eprints.utp.edu.my/30482/
_version_ 1738657113833472000