Measuring the Performance of An Object-Based Multi-Cloud Data Lake

As the amount of data generated by society continues to become less structured and larger in size, more and more organizations are implementing data lakes in the public cloud to store, process, and analyze this data. However, concerns over the availability of this data as well as the potential of ve...

Full description

Saved in:
Bibliographic Details
Main Authors: Saavedra, Miguel Zenon Nicanor L, Yu, William Emmanuel S
Format: text
Published: Archīum Ateneo 2023
Subjects:
Online Access:https://archium.ateneo.edu/discs-faculty-pubs/388
https://doi.org/10.1007/978-981-99-3243-6_4
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
Description
Summary:As the amount of data generated by society continues to become less structured and larger in size, more and more organizations are implementing data lakes in the public cloud to store, process, and analyze this data. However, concerns over the availability of this data as well as the potential of vendor lock-in lead more users to adopt the multi-cloud approach. This study investigates the viability of this approach in data lake use cases. Results that a multi-cloud data lake can potentially be implemented with less than 1% performance impact to query run times at the cost of a 300% increase in one-time loading. This opens the door for future work on more algorithms and implementations that leverage multi-cloud deployments to enhance availability, scalability, and cost optimization.