DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X

This undergraduate thesis contains data architecture recommendations used to manage a big data in a form of Apache Iceberg-compatible Data Lakehouse. The proposed data architecture is designed to solve problems which arises from the usage of two-tier architecture approach that is currently being...

Full description

Saved in:
Bibliographic Details
Main Author: Mareta Putri Leiden, Nadia
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/85014
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:85014
spelling id-itb.:850142024-08-19T13:12:32ZDATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X Mareta Putri Leiden, Nadia Indonesia Final Project Data Lakehouse, Apache Iceberg, HDFS, object storage, big data INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/85014 This undergraduate thesis contains data architecture recommendations used to manage a big data in a form of Apache Iceberg-compatible Data Lakehouse. The proposed data architecture is designed to solve problems which arises from the usage of two-tier architecture approach that is currently being implemented by Bank X to manage its big data. The research is started by understanding the current data architecture used by Bank X and conducting an analysis on Bank X’s need in managing big data. Furthermore, the recommended data architecture will be specifically designed based on Bank X’s needs. The proposed architectures are designed based on two storages, MinIO and HDFS, with other components such as catalog, table format, file format, compute engine, and storage engine. Furthermore, the proposed architectures’ performance will be compared under the scope of regulatory reporting, this scope limitation is caused by the fact that a big data management system has a wide and complex range of use cases. Performance comparison of the two architectures is being done through several testing processes and evaluations to ensure that the proposed architectures has fulfilled the minimum standards of a Data Lakehouse and fulfilled the needs of Bank X. After testing and evaluations has been completed, the result shown that both architectures, MinIO-based and HDFS-based architecture, have its own advantages and disadvantages. However, MinIO-based architecture has a better performance on most test cases. Hence, MinIO-based architecture is the best- performing architecture under the scope of regulatory reporting at Bank X. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description This undergraduate thesis contains data architecture recommendations used to manage a big data in a form of Apache Iceberg-compatible Data Lakehouse. The proposed data architecture is designed to solve problems which arises from the usage of two-tier architecture approach that is currently being implemented by Bank X to manage its big data. The research is started by understanding the current data architecture used by Bank X and conducting an analysis on Bank X’s need in managing big data. Furthermore, the recommended data architecture will be specifically designed based on Bank X’s needs. The proposed architectures are designed based on two storages, MinIO and HDFS, with other components such as catalog, table format, file format, compute engine, and storage engine. Furthermore, the proposed architectures’ performance will be compared under the scope of regulatory reporting, this scope limitation is caused by the fact that a big data management system has a wide and complex range of use cases. Performance comparison of the two architectures is being done through several testing processes and evaluations to ensure that the proposed architectures has fulfilled the minimum standards of a Data Lakehouse and fulfilled the needs of Bank X. After testing and evaluations has been completed, the result shown that both architectures, MinIO-based and HDFS-based architecture, have its own advantages and disadvantages. However, MinIO-based architecture has a better performance on most test cases. Hence, MinIO-based architecture is the best- performing architecture under the scope of regulatory reporting at Bank X.
format Final Project
author Mareta Putri Leiden, Nadia
spellingShingle Mareta Putri Leiden, Nadia
DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X
author_facet Mareta Putri Leiden, Nadia
author_sort Mareta Putri Leiden, Nadia
title DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X
title_short DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X
title_full DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X
title_fullStr DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X
title_full_unstemmed DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X
title_sort data lakehouse architecture using apache iceberg for big data management at bank x
url https://digilib.itb.ac.id/gdl/view/85014
_version_ 1822998881966751744