DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X
This undergraduate thesis contains data architecture recommendations used to manage a big data in a form of Apache Iceberg-compatible Data Lakehouse. The proposed data architecture is designed to solve problems which arises from the usage of two-tier architecture approach that is currently being...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/85014 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:85014 |
---|---|
spelling |
id-itb.:850142024-08-19T13:12:32ZDATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X Mareta Putri Leiden, Nadia Indonesia Final Project Data Lakehouse, Apache Iceberg, HDFS, object storage, big data INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/85014 This undergraduate thesis contains data architecture recommendations used to manage a big data in a form of Apache Iceberg-compatible Data Lakehouse. The proposed data architecture is designed to solve problems which arises from the usage of two-tier architecture approach that is currently being implemented by Bank X to manage its big data. The research is started by understanding the current data architecture used by Bank X and conducting an analysis on Bank X’s need in managing big data. Furthermore, the recommended data architecture will be specifically designed based on Bank X’s needs. The proposed architectures are designed based on two storages, MinIO and HDFS, with other components such as catalog, table format, file format, compute engine, and storage engine. Furthermore, the proposed architectures’ performance will be compared under the scope of regulatory reporting, this scope limitation is caused by the fact that a big data management system has a wide and complex range of use cases. Performance comparison of the two architectures is being done through several testing processes and evaluations to ensure that the proposed architectures has fulfilled the minimum standards of a Data Lakehouse and fulfilled the needs of Bank X. After testing and evaluations has been completed, the result shown that both architectures, MinIO-based and HDFS-based architecture, have its own advantages and disadvantages. However, MinIO-based architecture has a better performance on most test cases. Hence, MinIO-based architecture is the best- performing architecture under the scope of regulatory reporting at Bank X. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
This undergraduate thesis contains data architecture recommendations used to
manage a big data in a form of Apache Iceberg-compatible Data Lakehouse. The
proposed data architecture is designed to solve problems which arises from the
usage of two-tier architecture approach that is currently being implemented by
Bank X to manage its big data. The research is started by understanding the current
data architecture used by Bank X and conducting an analysis on Bank X’s need in
managing big data. Furthermore, the recommended data architecture will be
specifically designed based on Bank X’s needs.
The proposed architectures are designed based on two storages, MinIO and HDFS,
with other components such as catalog, table format, file format, compute engine,
and storage engine. Furthermore, the proposed architectures’ performance will be
compared under the scope of regulatory reporting, this scope limitation is caused
by the fact that a big data management system has a wide and complex range of use
cases. Performance comparison of the two architectures is being done through
several testing processes and evaluations to ensure that the proposed architectures
has fulfilled the minimum standards of a Data Lakehouse and fulfilled the needs of
Bank X. After testing and evaluations has been completed, the result shown that
both architectures, MinIO-based and HDFS-based architecture, have its own
advantages and disadvantages. However, MinIO-based architecture has a better
performance on most test cases. Hence, MinIO-based architecture is the best-
performing architecture under the scope of regulatory reporting at Bank X. |
format |
Final Project |
author |
Mareta Putri Leiden, Nadia |
spellingShingle |
Mareta Putri Leiden, Nadia DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X |
author_facet |
Mareta Putri Leiden, Nadia |
author_sort |
Mareta Putri Leiden, Nadia |
title |
DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X |
title_short |
DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X |
title_full |
DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X |
title_fullStr |
DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X |
title_full_unstemmed |
DATA LAKEHOUSE ARCHITECTURE USING APACHE ICEBERG FOR BIG DATA MANAGEMENT AT BANK X |
title_sort |
data lakehouse architecture using apache iceberg for big data management at bank x |
url |
https://digilib.itb.ac.id/gdl/view/85014 |
_version_ |
1822998881966751744 |