DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE

In migrating data pipelines from one platform to another, data & processes in the migrated system must be ensured to be identical to the legacy system. If discrepancies are found, a lot of time and effort is needed to find the root cause. This can be challenging especially if the pipeline is...

Full description

Saved in:
Bibliographic Details
Main Author: Farras Aqila, Aisyah
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/78296
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:78296
spelling id-itb.:782962023-09-18T22:17:22ZDATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE Farras Aqila, Aisyah Indonesia Final Project data lineage; data pipeline; data validation, data warehouse; migration INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/78296 In migrating data pipelines from one platform to another, data & processes in the migrated system must be ensured to be identical to the legacy system. If discrepancies are found, a lot of time and effort is needed to find the root cause. This can be challenging especially if the pipeline is complex. However, data lineage can help the process by determining which input data produce a particular set of output data. In this research, a data validation method is developed which determines specific values in the input data that are associated with the error in the output data. This method consists of four steps: (1) surrogate key handling, by not considering surrogate keys when comparing data warehouses; (2) error data detection, by doing set difference operations between data warehouses; (3) analysis of error data with lineage tracing, by doing tracing to find data that cause errors; and (4) pattern finding, by checking if error only occurs in data with certain values. Step (3) uses lineage tracing algorithm from Cui et.al. (2000). Step (4) is a development based on analysis from Alberini (2021) and open problem from Ikeda & Widom (2009). Based on the testing results, the developed method is able to identify data values in the input data that produce error in the output data. The developed application is also able to execute the developed method with some adjustments. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description In migrating data pipelines from one platform to another, data & processes in the migrated system must be ensured to be identical to the legacy system. If discrepancies are found, a lot of time and effort is needed to find the root cause. This can be challenging especially if the pipeline is complex. However, data lineage can help the process by determining which input data produce a particular set of output data. In this research, a data validation method is developed which determines specific values in the input data that are associated with the error in the output data. This method consists of four steps: (1) surrogate key handling, by not considering surrogate keys when comparing data warehouses; (2) error data detection, by doing set difference operations between data warehouses; (3) analysis of error data with lineage tracing, by doing tracing to find data that cause errors; and (4) pattern finding, by checking if error only occurs in data with certain values. Step (3) uses lineage tracing algorithm from Cui et.al. (2000). Step (4) is a development based on analysis from Alberini (2021) and open problem from Ikeda & Widom (2009). Based on the testing results, the developed method is able to identify data values in the input data that produce error in the output data. The developed application is also able to execute the developed method with some adjustments.
format Final Project
author Farras Aqila, Aisyah
spellingShingle Farras Aqila, Aisyah
DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE
author_facet Farras Aqila, Aisyah
author_sort Farras Aqila, Aisyah
title DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE
title_short DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE
title_full DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE
title_fullStr DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE
title_full_unstemmed DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE
title_sort data validation method in migrating data pipelines in a data warehousing environment using data lineage
url https://digilib.itb.ac.id/gdl/view/78296
_version_ 1822280988588244992