DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X

This undergraduate thesis contains data architecture recommendations used to manage a big data in a form of Apache Iceberg-compatible Data Lakehouse. The proposed data architecture is designed to solve problems which arises from the usage of two-tier architecture approach that is currently being...

Full description

Saved in:
Bibliographic Details
Main Author: Mareta Putri Leiden, Nadia
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/85014
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:This undergraduate thesis contains data architecture recommendations used to manage a big data in a form of Apache Iceberg-compatible Data Lakehouse. The proposed data architecture is designed to solve problems which arises from the usage of two-tier architecture approach that is currently being implemented by Bank X to manage its big data. The research is started by understanding the current data architecture used by Bank X and conducting an analysis on Bank X’s need in managing big data. Furthermore, the recommended data architecture will be specifically designed based on Bank X’s needs. The proposed architectures are designed based on two storages, MinIO and HDFS, with other components such as catalog, table format, file format, compute engine, and storage engine. Furthermore, the proposed architectures’ performance will be compared under the scope of regulatory reporting, this scope limitation is caused by the fact that a big data management system has a wide and complex range of use cases. Performance comparison of the two architectures is being done through several testing processes and evaluations to ensure that the proposed architectures has fulfilled the minimum standards of a Data Lakehouse and fulfilled the needs of Bank X. After testing and evaluations has been completed, the result shown that both architectures, MinIO-based and HDFS-based architecture, have its own advantages and disadvantages. However, MinIO-based architecture has a better performance on most test cases. Hence, MinIO-based architecture is the best- performing architecture under the scope of regulatory reporting at Bank X.