DATA VALIDATION METHOD IN MIGRATING DATA PIPELINES IN A DATA WAREHOUSING ENVIRONMENT USING DATA LINEAGE

In migrating data pipelines from one platform to another, data & processes in the migrated system must be ensured to be identical to the legacy system. If discrepancies are found, a lot of time and effort is needed to find the root cause. This can be challenging especially if the pipeline is...

全面介紹

Saved in:
書目詳細資料
主要作者: Farras Aqila, Aisyah
格式: Final Project
語言:Indonesia
在線閱讀:https://digilib.itb.ac.id/gdl/view/78296
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Institut Teknologi Bandung
語言: Indonesia
實物特徵
總結:In migrating data pipelines from one platform to another, data & processes in the migrated system must be ensured to be identical to the legacy system. If discrepancies are found, a lot of time and effort is needed to find the root cause. This can be challenging especially if the pipeline is complex. However, data lineage can help the process by determining which input data produce a particular set of output data. In this research, a data validation method is developed which determines specific values in the input data that are associated with the error in the output data. This method consists of four steps: (1) surrogate key handling, by not considering surrogate keys when comparing data warehouses; (2) error data detection, by doing set difference operations between data warehouses; (3) analysis of error data with lineage tracing, by doing tracing to find data that cause errors; and (4) pattern finding, by checking if error only occurs in data with certain values. Step (3) uses lineage tracing algorithm from Cui et.al. (2000). Step (4) is a development based on analysis from Alberini (2021) and open problem from Ikeda & Widom (2009). Based on the testing results, the developed method is able to identify data values in the input data that produce error in the output data. The developed application is also able to execute the developed method with some adjustments.