DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ

With the increasing volume of data generated by humans, organizations are increasingly relying on data analysis to improve business operations. The process of transforming raw data into useful information requires data analysis. Before raw data can be analyzed, the processes of data collection, c...

Full description

Saved in:
Bibliographic Details
Main Author: Faizal Aziz, Ismail
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/79653
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:79653
spelling id-itb.:796532024-01-14T23:54:36ZDESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ Faizal Aziz, Ismail Indonesia Final Project streaming data pipeline, ETL, Kafka INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/79653 With the increasing volume of data generated by humans, organizations are increasingly relying on data analysis to improve business operations. The process of transforming raw data into useful information requires data analysis. Before raw data can be analyzed, the processes of data collection, cleaning, transformation, and loading are necessary. A data pipeline is a set of tools that function to deliver data through these three processes before it can be analyzed. At PT. XYZ, these three processes take a minimum of one day. This hinders downstream processes such as ad-hoc data analysis, fraud detection, and others. In this final project, the author proposes a streaming data pipeline system to shorten the data collection gap. The proposed solution sources data from the PostgreSQL database, with the data destination being the Google BigQuery data warehouse, and the streaming platform tool being Apache Kafka. In this context, the processed data includes financial transaction data and personal customer data from PT. XYZ. In its implementation, there are limitations such as the pipeline can only use one virtual machine. The data pipeline development process is divided into four main stages: defining requirements, designing the architecture, implementing the design, and testing. Testing conducted on the system includes functional and non-functional requirement testing through end-to-end data processing simulation. The result of the design and implementation of the streaming data pipeline is a pipeline that utilizes tools such as Debezium, Apache Kafka, and Confluent Google BigQuery Sink Connector. Testing on the implemented streaming data pipeline shows that it meets both functional and non- functional requirements. As a comparison, processing the collection of 500 thousand rows of data takes a gap time of 2 to 3 minutes. This shortens the data collection process from one day to a matter of minutes. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description With the increasing volume of data generated by humans, organizations are increasingly relying on data analysis to improve business operations. The process of transforming raw data into useful information requires data analysis. Before raw data can be analyzed, the processes of data collection, cleaning, transformation, and loading are necessary. A data pipeline is a set of tools that function to deliver data through these three processes before it can be analyzed. At PT. XYZ, these three processes take a minimum of one day. This hinders downstream processes such as ad-hoc data analysis, fraud detection, and others. In this final project, the author proposes a streaming data pipeline system to shorten the data collection gap. The proposed solution sources data from the PostgreSQL database, with the data destination being the Google BigQuery data warehouse, and the streaming platform tool being Apache Kafka. In this context, the processed data includes financial transaction data and personal customer data from PT. XYZ. In its implementation, there are limitations such as the pipeline can only use one virtual machine. The data pipeline development process is divided into four main stages: defining requirements, designing the architecture, implementing the design, and testing. Testing conducted on the system includes functional and non-functional requirement testing through end-to-end data processing simulation. The result of the design and implementation of the streaming data pipeline is a pipeline that utilizes tools such as Debezium, Apache Kafka, and Confluent Google BigQuery Sink Connector. Testing on the implemented streaming data pipeline shows that it meets both functional and non- functional requirements. As a comparison, processing the collection of 500 thousand rows of data takes a gap time of 2 to 3 minutes. This shortens the data collection process from one day to a matter of minutes.
format Final Project
author Faizal Aziz, Ismail
spellingShingle Faizal Aziz, Ismail
DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
author_facet Faizal Aziz, Ismail
author_sort Faizal Aziz, Ismail
title DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
title_short DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
title_full DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
title_fullStr DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
title_full_unstemmed DESIGN AND IMPLEMENTATION OF STREAMING DATA PIPELINE FOR FINANCIAL TRANSACTIONS ON PT. XYZ
title_sort design and implementation of streaming data pipeline for financial transactions on pt. xyz
url https://digilib.itb.ac.id/gdl/view/79653
_version_ 1822996406952001536