DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
With the increasing volume of data generated by humans, organizations are increasingly relying on data analysis to improve business operations. The process of transforming raw data into useful information requires data analysis. Before raw data can be analyzed, the processes of data collection, c...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/79653 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:79653 |
---|---|
spelling |
id-itb.:796532024-01-14T23:54:36ZDESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ Faizal Aziz, Ismail Indonesia Final Project streaming data pipeline, ETL, Kafka INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/79653 With the increasing volume of data generated by humans, organizations are increasingly relying on data analysis to improve business operations. The process of transforming raw data into useful information requires data analysis. Before raw data can be analyzed, the processes of data collection, cleaning, transformation, and loading are necessary. A data pipeline is a set of tools that function to deliver data through these three processes before it can be analyzed. At PT. XYZ, these three processes take a minimum of one day. This hinders downstream processes such as ad-hoc data analysis, fraud detection, and others. In this final project, the author proposes a streaming data pipeline system to shorten the data collection gap. The proposed solution sources data from the PostgreSQL database, with the data destination being the Google BigQuery data warehouse, and the streaming platform tool being Apache Kafka. In this context, the processed data includes financial transaction data and personal customer data from PT. XYZ. In its implementation, there are limitations such as the pipeline can only use one virtual machine. The data pipeline development process is divided into four main stages: defining requirements, designing the architecture, implementing the design, and testing. Testing conducted on the system includes functional and non-functional requirement testing through end-to-end data processing simulation. The result of the design and implementation of the streaming data pipeline is a pipeline that utilizes tools such as Debezium, Apache Kafka, and Confluent Google BigQuery Sink Connector. Testing on the implemented streaming data pipeline shows that it meets both functional and non- functional requirements. As a comparison, processing the collection of 500 thousand rows of data takes a gap time of 2 to 3 minutes. This shortens the data collection process from one day to a matter of minutes. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
With the increasing volume of data generated by humans, organizations are
increasingly relying on data analysis to improve business operations. The process
of transforming raw data into useful information requires data analysis. Before
raw data can be analyzed, the processes of data collection, cleaning,
transformation, and loading are necessary. A data pipeline is a set of tools that
function to deliver data through these three processes before it can be analyzed. At
PT. XYZ, these three processes take a minimum of one day. This hinders
downstream processes such as ad-hoc data analysis, fraud detection, and others. In
this final project, the author proposes a streaming data pipeline system to shorten
the data collection gap. The proposed solution sources data from the PostgreSQL
database, with the data destination being the Google BigQuery data warehouse,
and the streaming platform tool being Apache Kafka. In this context, the
processed data includes financial transaction data and personal customer data
from PT. XYZ. In its implementation, there are limitations such as the pipeline
can only use one virtual machine. The data pipeline development process is
divided into four main stages: defining requirements, designing the architecture,
implementing the design, and testing. Testing conducted on the system includes
functional and non-functional requirement testing through end-to-end data
processing simulation. The result of the design and implementation of the
streaming data pipeline is a pipeline that utilizes tools such as Debezium, Apache
Kafka, and Confluent Google BigQuery Sink Connector. Testing on the
implemented streaming data pipeline shows that it meets both functional and non-
functional requirements. As a comparison, processing the collection of 500
thousand rows of data takes a gap time of 2 to 3 minutes. This shortens the data
collection process from one day to a matter of minutes. |
format |
Final Project |
author |
Faizal Aziz, Ismail |
spellingShingle |
Faizal Aziz, Ismail DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ |
author_facet |
Faizal Aziz, Ismail |
author_sort |
Faizal Aziz, Ismail |
title |
DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ |
title_short |
DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ |
title_full |
DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ |
title_fullStr |
DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ |
title_full_unstemmed |
DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ |
title_sort |
design and implementation of streaming data pipeline for financial transactions on pt. xyz |
url |
https://digilib.itb.ac.id/gdl/view/79653 |
_version_ |
1822996406952001536 |