DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE

In migrating data pipelines from one platform to another, data & processes in the migrated system must be ensured to be identical to the legacy system. If discrepancies are found, a lot of time and effort is needed to find the root cause. This can be challenging especially if the pipeline is...

Full description

Saved in:
Bibliographic Details
Main Author: Farras Aqila, Aisyah
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/78296
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:In migrating data pipelines from one platform to another, data & processes in the migrated system must be ensured to be identical to the legacy system. If discrepancies are found, a lot of time and effort is needed to find the root cause. This can be challenging especially if the pipeline is complex. However, data lineage can help the process by determining which input data produce a particular set of output data. In this research, a data validation method is developed which determines specific values in the input data that are associated with the error in the output data. This method consists of four steps: (1) surrogate key handling, by not considering surrogate keys when comparing data warehouses; (2) error data detection, by doing set difference operations between data warehouses; (3) analysis of error data with lineage tracing, by doing tracing to find data that cause errors; and (4) pattern finding, by checking if error only occurs in data with certain values. Step (3) uses lineage tracing algorithm from Cui et.al. (2000). Step (4) is a development based on analysis from Alberini (2021) and open problem from Ikeda & Widom (2009). Based on the testing results, the developed method is able to identify data values in the input data that produce error in the output data. The developed application is also able to execute the developed method with some adjustments.