DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE
In migrating data pipelines from one platform to another, data & processes in the migrated system must be ensured to be identical to the legacy system. If discrepancies are found, a lot of time and effort is needed to find the root cause. This can be challenging especially if the pipeline is...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/78296 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:78296 |
---|---|
spelling |
id-itb.:782962023-09-18T22:17:22ZDATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE Farras Aqila, Aisyah Indonesia Final Project data lineage; data pipeline; data validation, data warehouse; migration INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/78296 In migrating data pipelines from one platform to another, data & processes in the migrated system must be ensured to be identical to the legacy system. If discrepancies are found, a lot of time and effort is needed to find the root cause. This can be challenging especially if the pipeline is complex. However, data lineage can help the process by determining which input data produce a particular set of output data. In this research, a data validation method is developed which determines specific values in the input data that are associated with the error in the output data. This method consists of four steps: (1) surrogate key handling, by not considering surrogate keys when comparing data warehouses; (2) error data detection, by doing set difference operations between data warehouses; (3) analysis of error data with lineage tracing, by doing tracing to find data that cause errors; and (4) pattern finding, by checking if error only occurs in data with certain values. Step (3) uses lineage tracing algorithm from Cui et.al. (2000). Step (4) is a development based on analysis from Alberini (2021) and open problem from Ikeda & Widom (2009). Based on the testing results, the developed method is able to identify data values in the input data that produce error in the output data. The developed application is also able to execute the developed method with some adjustments. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
In migrating data pipelines from one platform to another, data & processes in the
migrated system must be ensured to be identical to the legacy system. If
discrepancies are found, a lot of time and effort is needed to find the root cause.
This can be challenging especially if the pipeline is complex. However, data lineage
can help the process by determining which input data produce a particular set of
output data.
In this research, a data validation method is developed which determines specific
values in the input data that are associated with the error in the output data. This
method consists of four steps: (1) surrogate key handling, by not considering
surrogate keys when comparing data warehouses; (2) error data detection, by doing
set difference operations between data warehouses; (3) analysis of error data with
lineage tracing, by doing tracing to find data that cause errors; and (4) pattern
finding, by checking if error only occurs in data with certain values. Step (3) uses
lineage tracing algorithm from Cui et.al. (2000). Step (4) is a development based
on analysis from Alberini (2021) and open problem from Ikeda & Widom (2009).
Based on the testing results, the developed method is able to identify data values in
the input data that produce error in the output data. The developed application is
also able to execute the developed method with some adjustments. |
format |
Final Project |
author |
Farras Aqila, Aisyah |
spellingShingle |
Farras Aqila, Aisyah DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE |
author_facet |
Farras Aqila, Aisyah |
author_sort |
Farras Aqila, Aisyah |
title |
DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE |
title_short |
DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE |
title_full |
DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE |
title_fullStr |
DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE |
title_full_unstemmed |
DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE |
title_sort |
data validation method in migrating data pipelines in a data warehousing environment using data lineage |
url |
https://digilib.itb.ac.id/gdl/view/78296 |
_version_ |
1822280988588244992 |