Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data

Data Quality (DQ) assessment remains one of the major challenges for Big Data (BD) due to the complexity of handling large volumes of data. Traditional data transformation methods such as Extract-Transform-Load (ETL) use data sources from a diverse range of devices and locations resulting in incompl...

Full description

Saved in:
Bibliographic Details
Main Author: Onyeabor, Grace Amina
Format: Thesis
Language:English
English
Published: 2024
Subjects:
Online Access:https://etd.uum.edu.my/11184/1/depositpermission-900601.pdf
https://etd.uum.edu.my/11184/2/s900601_01.pdf
https://etd.uum.edu.my/11184/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Utara Malaysia
Language: English
English
id my.uum.etd.11184
record_format eprints
spelling my.uum.etd.111842024-06-23T02:59:22Z https://etd.uum.edu.my/11184/ Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data Onyeabor, Grace Amina T58.5-58.64 Information technology Data Quality (DQ) assessment remains one of the major challenges for Big Data (BD) due to the complexity of handling large volumes of data. Traditional data transformation methods such as Extract-Transform-Load (ETL) use data sources from a diverse range of devices and locations resulting in incomplete and inconsistent DQ that may lead to wrong insights and decisions. Therefore, DQ is vital for the effective operation and management of BD. Recognizing many DQ features from its definition to the various dimensions is essential for equipping techniques and procedures to improve DQ. This research focuses on two aspects of DQ: completeness, and consistency. Firstly, an enhanced data transformation model (2CsDQT) is proposed to assess and improve big data quality. A new algorithm using ontology and clustering methods is used to identify and correct incomplete and inconsistent data, which resolves the availability and comprehensiveness of data, similarity between data items, and missing specific attributes of data. Secondly, using a clustering technique to analyse DQ, and improve employing results from the 2CsDQT model. The complete and consistent data are put into clusters, and the designed algorithm predicts the position of any incomplete and inconsistent data, based on its value to be added to the specific cluster. The study was evaluated using the developed model and benchmarked with existing data transformation techniques in the literature. This research shows that the 2CsDQT model successfully improves BD quality and outperforms previously proposed methods. Data completeness and consistency results outperform related articles and benchmark studies in the literature on the datasets of two different test cases. The theoretical contribution of this research work is to provide insight into the importance of DQ issues in BD and the effect of inconsistency and incompleteness on BD application. The practical contribution is the provision of enhanced data transformation models for DQ leading to better data analysis and strategic planning. 2024 Thesis NonPeerReviewed text en https://etd.uum.edu.my/11184/1/depositpermission-900601.pdf text en https://etd.uum.edu.my/11184/2/s900601_01.pdf Onyeabor, Grace Amina (2024) Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data. Doctoral thesis, Universiti Utara Malaysia.
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
url_provider http://etd.uum.edu.my/
language English
English
topic T58.5-58.64 Information technology
spellingShingle T58.5-58.64 Information technology
Onyeabor, Grace Amina
Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
description Data Quality (DQ) assessment remains one of the major challenges for Big Data (BD) due to the complexity of handling large volumes of data. Traditional data transformation methods such as Extract-Transform-Load (ETL) use data sources from a diverse range of devices and locations resulting in incomplete and inconsistent DQ that may lead to wrong insights and decisions. Therefore, DQ is vital for the effective operation and management of BD. Recognizing many DQ features from its definition to the various dimensions is essential for equipping techniques and procedures to improve DQ. This research focuses on two aspects of DQ: completeness, and consistency. Firstly, an enhanced data transformation model (2CsDQT) is proposed to assess and improve big data quality. A new algorithm using ontology and clustering methods is used to identify and correct incomplete and inconsistent data, which resolves the availability and comprehensiveness of data, similarity between data items, and missing specific attributes of data. Secondly, using a clustering technique to analyse DQ, and improve employing results from the 2CsDQT model. The complete and consistent data are put into clusters, and the designed algorithm predicts the position of any incomplete and inconsistent data, based on its value to be added to the specific cluster. The study was evaluated using the developed model and benchmarked with existing data transformation techniques in the literature. This research shows that the 2CsDQT model successfully improves BD quality and outperforms previously proposed methods. Data completeness and consistency results outperform related articles and benchmark studies in the literature on the datasets of two different test cases. The theoretical contribution of this research work is to provide insight into the importance of DQ issues in BD and the effect of inconsistency and incompleteness on BD application. The practical contribution is the provision of enhanced data transformation models for DQ leading to better data analysis and strategic planning.
format Thesis
author Onyeabor, Grace Amina
author_facet Onyeabor, Grace Amina
author_sort Onyeabor, Grace Amina
title Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
title_short Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
title_full Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
title_fullStr Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
title_full_unstemmed Data Transformation Model For Addressing Incomplete And Inconsistent Quality Issues Of Big Data
title_sort data transformation model for addressing incomplete and inconsistent quality issues of big data
publishDate 2024
url https://etd.uum.edu.my/11184/1/depositpermission-900601.pdf
https://etd.uum.edu.my/11184/2/s900601_01.pdf
https://etd.uum.edu.my/11184/
_version_ 1802980051596083200