DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE
In migrating data pipelines from one platform to another, data & processes in the migrated system must be ensured to be identical to the legacy system. If discrepancies are found, a lot of time and effort is needed to find the root cause. This can be challenging especially if the pipeline is...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/78296 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | In migrating data pipelines from one platform to another, data & processes in the
migrated system must be ensured to be identical to the legacy system. If
discrepancies are found, a lot of time and effort is needed to find the root cause.
This can be challenging especially if the pipeline is complex. However, data lineage
can help the process by determining which input data produce a particular set of
output data.
In this research, a data validation method is developed which determines specific
values in the input data that are associated with the error in the output data. This
method consists of four steps: (1) surrogate key handling, by not considering
surrogate keys when comparing data warehouses; (2) error data detection, by doing
set difference operations between data warehouses; (3) analysis of error data with
lineage tracing, by doing tracing to find data that cause errors; and (4) pattern
finding, by checking if error only occurs in data with certain values. Step (3) uses
lineage tracing algorithm from Cui et.al. (2000). Step (4) is a development based
on analysis from Alberini (2021) and open problem from Ikeda & Widom (2009).
Based on the testing results, the developed method is able to identify data values in
the input data that produce error in the output data. The developed application is
also able to execute the developed method with some adjustments. |
---|