Big data: Performance profiling of Meteorological and Oceanographic data on Hive

The emergence and development of big data tools, techniques and systems motivate industries and organizations to embrace and explore research in big data. This is to circumvent the challenges of the traditional database systems. However, the available benchmarks and workloads are for some specific a...

Full description

Saved in:
Bibliographic Details
Main Authors: Abdullahi, A.U., Ahmad, R., Zakaria, N.M.
Format: Conference or Workshop Item
Published: Institute of Electrical and Electronics Engineers Inc. 2016
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85010433068&doi=10.1109%2fICCOINS.2016.7783215&partnerID=40&md5=c44afcfa573af47fe130803c36767564
http://eprints.utp.edu.my/30482/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Petronas
Description
Summary:The emergence and development of big data tools, techniques and systems motivate industries and organizations to embrace and explore research in big data. This is to circumvent the challenges of the traditional database systems. However, the available benchmarks and workloads are for some specific aspect of Information Technology industries, which has differences in nature and complexities with the data obtained from other sources. Hence there is need for using data from other domains in order to evaluate the performance and maturity of the big data technologies. In this paper the performance profiling of Meteorological and Oceanographic data on Hive is conducted. Hive being the commonly used data warehouse analytical platform for big data is chosen with the view to exposing the intricacies that are involved in the formating and loading of the data. The response time for indexed and non-indexed retrievals using three set of queries frequently used in the area is found. The query types are Type 1 SELECT with WHERE clause, Type 2 SELECT with JOIN clause. And Type 3 SELECT with GROUP BY clause. The experimental results show that a good response time for both indexed and Non-indexed tables are achieved. The indexed retrieval shows a significant decrease in the response time for Type 1 query for all data sizes and for Type 3 query for 100GB data size and less. It also shows additional overhead for Type 2 query for all data sizes and Type 3 query for 500GB and more data sizes. The Meteorological and Oceanographic data if properly formated it's analytics with Hive proved to be efficient compared to the traditional database systems. The results of this study has the potentials of attracting the oil and gas companies to adopt big data technologies for the handling of their exploration dataset. © 2016 IEEE.