Framework for a semantic data transformation in solving data quality issues in big data
Purpose - Today organizations and companies are generating a tremendous amount of data.At the same time, an enormous amount of data is being received and acquired from various resources and being stored which brings us to the era of Big Data (BD). BD is a term used to describe massive datasets that...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://repo.uum.edu.my/24488/1/SICONSEM%202017%2019%2021.pdf http://repo.uum.edu.my/24488/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Utara Malaysia |
Language: | English |
id |
my.uum.repo.24488 |
---|---|
record_format |
eprints |
spelling |
my.uum.repo.244882020-11-03T07:36:09Z http://repo.uum.edu.my/24488/ Framework for a semantic data transformation in solving data quality issues in big data Onyeabor, Grace Ta'a, Azman QA75 Electronic computers. Computer science Purpose - Today organizations and companies are generating a tremendous amount of data.At the same time, an enormous amount of data is being received and acquired from various resources and being stored which brings us to the era of Big Data (BD). BD is a term used to describe massive datasets that are of diverse format created at a very high speed, the management of which is near impossible by using traditional database management systems (Kanchi et al., 2015). With the dawn of BD, Data Quality (DQ) has become very imperative.Volume, velocity and variety – the initial 3Vs characteristics of BD are usually used to describe the main properties of BD.But for extraction of value (which is another V property) and make BD effective and efficient for organizational decision making, the significance of another V of BD, veracity, is gradually coming to light. Veracity straightly denotes inconsistency and DQ issues.Today, veracity in data analysis is the biggest challenge when compared to other aspects such as volume and velocity. Trusting the data acquired goes a long way in implementing decisions from an automated decision making system and veracity helps to validate the data acquired (Agarwal, Ravikumar, & Saha, 2016).DQ represents an important issue in every business.To be successful, companies need high-quality data on inventory, supplies, customers, vendors and other vital enterprise information in order to run efficiently their data analysis applications (e.g. decision support systems, data mining, customer relationship management) and produce accurate results (McAfee & Brynjolfsson, 2012).During the transformation of huge volume of data, there might exist data mismatch, miscalculation and/or loss of useful data that leads to an unsuccessful data transformation (Tesfagiorgish, & JunYi, 2015) which will in turn leads to poor data quality. In addition of external data, particularly RDF data, increase some challenges for data transformation when compared with the traditional transformation process. For example, the drawbacks of using BD in the business analysis process is that the data is almost schema less, and RDF data contains poor or complex schema. Traditional data transformation tools are not able to process such inconsistent and heterogeneous data because they do not support semantic-aware data, they are entirely schema-dependent and they do not focus on expressive semantic relationships to integrate data from different sources.Thus, BD requires more powerful tools to transform data semantically. While the research on this area so far offer different frameworks, to the best of the researchers knowledge, not much research has been done in relation to transformation of DQ in BD. The much that has been done has not gone beyond cleansing incoming data generally (Merino et al., 2016).The proposed framework presents the method for the analysis of DQ using BD from various domains and applying semantic technologies in the ETL transformation stage to create a semantic model for the enablement of quality in the data. 2017-12-05 Conference or Workshop Item PeerReviewed application/pdf en http://repo.uum.edu.my/24488/1/SICONSEM%202017%2019%2021.pdf Onyeabor, Grace and Ta'a, Azman (2017) Framework for a semantic data transformation in solving data quality issues in big data. In: Sintok International Conference on Social Science and Management (SICONSEM 2017), 5 December 2017, Adya Hotel, Langkawi Island, Kedah, Malaysia. |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Institutional Repository |
url_provider |
http://repo.uum.edu.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Onyeabor, Grace Ta'a, Azman Framework for a semantic data transformation in solving data quality issues in big data |
description |
Purpose - Today organizations and companies are generating a tremendous amount of data.At the same time, an enormous amount of data is being received and acquired from various resources and being stored which brings us to the era of Big Data (BD). BD is a term used to describe massive datasets that are of diverse format created at a very high speed, the management of which is near impossible by using traditional database management systems (Kanchi et al., 2015). With the dawn of BD, Data Quality (DQ) has become very imperative.Volume, velocity and variety – the initial 3Vs characteristics of BD are usually used to describe the main properties of BD.But for extraction of value (which is another V property) and make BD effective and efficient for organizational decision making, the significance of another V of BD, veracity, is gradually coming to light. Veracity straightly denotes inconsistency and DQ issues.Today, veracity in data analysis is the biggest challenge when compared to other aspects such as volume and velocity. Trusting the data acquired goes a long way in implementing decisions from an automated decision making system and veracity helps to validate the data acquired (Agarwal, Ravikumar, & Saha, 2016).DQ represents an important issue in every business.To be successful, companies need high-quality data on inventory, supplies, customers, vendors and other vital enterprise information in order to run efficiently their data analysis applications (e.g. decision support systems, data mining, customer relationship management) and produce accurate results (McAfee & Brynjolfsson, 2012).During the transformation of huge volume of data, there might exist data mismatch, miscalculation and/or loss of useful data that leads to an unsuccessful data transformation (Tesfagiorgish, & JunYi, 2015) which will in turn leads to poor data quality. In
addition of external data, particularly RDF data, increase some challenges for data transformation
when compared with the traditional transformation process. For example, the drawbacks of using BD in the business analysis process is that the data is almost schema less, and RDF data contains poor or complex schema. Traditional data transformation tools are not able to process such inconsistent and heterogeneous data because they do not support semantic-aware data, they are entirely schema-dependent and they do not focus on expressive semantic relationships to integrate data from different sources.Thus, BD requires more powerful tools to transform data semantically. While the research on this area so far offer different frameworks, to the best of the researchers knowledge, not much research has been done in relation to transformation of DQ in BD. The much that has been done has not gone beyond cleansing incoming data generally (Merino et al., 2016).The proposed framework presents the method for the analysis of DQ using BD from various domains and applying semantic technologies in the ETL transformation stage to create a semantic model for the enablement of quality in the data. |
format |
Conference or Workshop Item |
author |
Onyeabor, Grace Ta'a, Azman |
author_facet |
Onyeabor, Grace Ta'a, Azman |
author_sort |
Onyeabor, Grace |
title |
Framework for a semantic data transformation in solving data quality issues in big data |
title_short |
Framework for a semantic data transformation in solving data quality issues in big data |
title_full |
Framework for a semantic data transformation in solving data quality issues in big data |
title_fullStr |
Framework for a semantic data transformation in solving data quality issues in big data |
title_full_unstemmed |
Framework for a semantic data transformation in solving data quality issues in big data |
title_sort |
framework for a semantic data transformation in solving data quality issues in big data |
publishDate |
2017 |
url |
http://repo.uum.edu.my/24488/1/SICONSEM%202017%2019%2021.pdf http://repo.uum.edu.my/24488/ |
_version_ |
1684655797173223424 |