DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ

With the increasing volume of data generated by humans, organizations are increasingly relying on data analysis to improve business operations. The process of transforming raw data into useful information requires data analysis. Before raw data can be analyzed, the processes of data collection, c...

Full description

Saved in:

Bibliographic Details
Main Author:	Faizal Aziz, Ismail
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/79653
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:79653
spelling	id-itb.:796532024-01-14T23:54:36ZDESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ Faizal Aziz, Ismail Indonesia Final Project streaming data pipeline, ETL, Kafka INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/79653 With the increasing volume of data generated by humans, organizations are increasingly relying on data analysis to improve business operations. The process of transforming raw data into useful information requires data analysis. Before raw data can be analyzed, the processes of data collection, cleaning, transformation, and loading are necessary. A data pipeline is a set of tools that function to deliver data through these three processes before it can be analyzed. At PT. XYZ, these three processes take a minimum of one day. This hinders downstream processes such as ad-hoc data analysis, fraud detection, and others. In this final project, the author proposes a streaming data pipeline system to shorten the data collection gap. The proposed solution sources data from the PostgreSQL database, with the data destination being the Google BigQuery data warehouse, and the streaming platform tool being Apache Kafka. In this context, the processed data includes financial transaction data and personal customer data from PT. XYZ. In its implementation, there are limitations such as the pipeline can only use one virtual machine. The data pipeline development process is divided into four main stages: defining requirements, designing the architecture, implementing the design, and testing. Testing conducted on the system includes functional and non-functional requirement testing through end-to-end data processing simulation. The result of the design and implementation of the streaming data pipeline is a pipeline that utilizes tools such as Debezium, Apache Kafka, and Confluent Google BigQuery Sink Connector. Testing on the implemented streaming data pipeline shows that it meets both functional and non- functional requirements. As a comparison, processing the collection of 500 thousand rows of data takes a gap time of 2 to 3 minutes. This shortens the data collection process from one day to a matter of minutes. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	With the increasing volume of data generated by humans, organizations are increasingly relying on data analysis to improve business operations. The process of transforming raw data into useful information requires data analysis. Before raw data can be analyzed, the processes of data collection, cleaning, transformation, and loading are necessary. A data pipeline is a set of tools that function to deliver data through these three processes before it can be analyzed. At PT. XYZ, these three processes take a minimum of one day. This hinders downstream processes such as ad-hoc data analysis, fraud detection, and others. In this final project, the author proposes a streaming data pipeline system to shorten the data collection gap. The proposed solution sources data from the PostgreSQL database, with the data destination being the Google BigQuery data warehouse, and the streaming platform tool being Apache Kafka. In this context, the processed data includes financial transaction data and personal customer data from PT. XYZ. In its implementation, there are limitations such as the pipeline can only use one virtual machine. The data pipeline development process is divided into four main stages: defining requirements, designing the architecture, implementing the design, and testing. Testing conducted on the system includes functional and non-functional requirement testing through end-to-end data processing simulation. The result of the design and implementation of the streaming data pipeline is a pipeline that utilizes tools such as Debezium, Apache Kafka, and Confluent Google BigQuery Sink Connector. Testing on the implemented streaming data pipeline shows that it meets both functional and non- functional requirements. As a comparison, processing the collection of 500 thousand rows of data takes a gap time of 2 to 3 minutes. This shortens the data collection process from one day to a matter of minutes.
format	Final Project
author	Faizal Aziz, Ismail
spellingShingle	Faizal Aziz, Ismail DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
author_facet	Faizal Aziz, Ismail
author_sort	Faizal Aziz, Ismail
title	DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
title_short	DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
title_full	DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
title_fullStr	DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
title_full_unstemmed	DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
title_sort	design and implementation of streaming data pipeline for financial transactions on pt. xyz
url	https://digilib.itb.ac.id/gdl/view/79653
_version_	1822996406952001536

DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ

Similar Items