Data Harmonization for Heterogeneous Datasets in Big Data - A Conceptual Model

Data comes from machines, transactions, and social media, which is gigantic and disparate in nature. About 80 of todayâ��s data is unstructured, while the remaining percentage is semistructured and structured. It is a big challenge for management to make efficient decisions on run time and also to s...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kumar, G., Basri, S., Imam, A.A., Balogun, A.O.
Format:	Article
Published:	Springer Science and Business Media Deutschland GmbH 2020
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85098178362&doi=10.1007%2f978-3-030-63322-6_61&partnerID=40&md5=b54cf25cfd96e3825e192f3b37d975b9 http://eprints.utp.edu.my/24643/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Teknologi Petronas

id	my.utp.eprints.24643
record_format	eprints
spelling	my.utp.eprints.246432021-08-27T06:13:15Z Data Harmonization for Heterogeneous Datasets in Big Data - A Conceptual Model Kumar, G. Basri, S. Imam, A.A. Balogun, A.O. Data comes from machines, transactions, and social media, which is gigantic and disparate in nature. About 80 of todayâ��s data is unstructured, while the remaining percentage is semistructured and structured. It is a big challenge for management to make efficient decisions on run time and also to store heterogeneous nature of data by existing tools. Data Harmonization can be used to solve the heterogeneity problem; the idea of data harmonization is to provide a uniform representation and remove all forms of heterogeneity from the heterogeneous datasets. In recent studies, various models have been developed for integrating, mapping, and fusion of structured and semistructured datasets, but no such model has been developed for structured, semistructured, and unstructured datasets. Information extraction is used as a vital component to extract data from different textual datasets that information formats may comprise in different file formats, i.e., Excel, JSON, and text. For developing textual data harmonization model for heterogeneous datasets, comprises of structured, semistructured, and unstructured data based on phrases similarity techniques, it needs to be first preprocessed using Natural Language Processing and its techniques like Bag of Phrases, Parts of Speech and so on. Therefore this paper focuses on the conceptual data harmonization model based on text similarity technique, which will help to blend structured, semistructured, and unstructured data. The selected phrases from heterogeneous datasets will go through training and testing using Recurrent Neural Network. Â© 2020, The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG. Springer Science and Business Media Deutschland GmbH 2020 Article NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85098178362&doi=10.1007%2f978-3-030-63322-6_61&partnerID=40&md5=b54cf25cfd96e3825e192f3b37d975b9 Kumar, G. and Basri, S. and Imam, A.A. and Balogun, A.O. (2020) Data Harmonization for Heterogeneous Datasets in Big Data - A Conceptual Model. Advances in Intelligent Systems and Computing, 1294 . pp. 723-734. http://eprints.utp.edu.my/24643/
institution	Universiti Teknologi Petronas
building	UTP Resource Centre
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Petronas
content_source	UTP Institutional Repository
url_provider	http://eprints.utp.edu.my/
description	Data comes from machines, transactions, and social media, which is gigantic and disparate in nature. About 80 of todayâ��s data is unstructured, while the remaining percentage is semistructured and structured. It is a big challenge for management to make efficient decisions on run time and also to store heterogeneous nature of data by existing tools. Data Harmonization can be used to solve the heterogeneity problem; the idea of data harmonization is to provide a uniform representation and remove all forms of heterogeneity from the heterogeneous datasets. In recent studies, various models have been developed for integrating, mapping, and fusion of structured and semistructured datasets, but no such model has been developed for structured, semistructured, and unstructured datasets. Information extraction is used as a vital component to extract data from different textual datasets that information formats may comprise in different file formats, i.e., Excel, JSON, and text. For developing textual data harmonization model for heterogeneous datasets, comprises of structured, semistructured, and unstructured data based on phrases similarity techniques, it needs to be first preprocessed using Natural Language Processing and its techniques like Bag of Phrases, Parts of Speech and so on. Therefore this paper focuses on the conceptual data harmonization model based on text similarity technique, which will help to blend structured, semistructured, and unstructured data. The selected phrases from heterogeneous datasets will go through training and testing using Recurrent Neural Network. Â© 2020, The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG.
format	Article
author	Kumar, G. Basri, S. Imam, A.A. Balogun, A.O.
spellingShingle	Kumar, G. Basri, S. Imam, A.A. Balogun, A.O. Data Harmonization for Heterogeneous Datasets in Big Data - A Conceptual Model
author_facet	Kumar, G. Basri, S. Imam, A.A. Balogun, A.O.
author_sort	Kumar, G.
title	Data Harmonization for Heterogeneous Datasets in Big Data - A Conceptual Model
title_short	Data Harmonization for Heterogeneous Datasets in Big Data - A Conceptual Model
title_full	Data Harmonization for Heterogeneous Datasets in Big Data - A Conceptual Model
title_fullStr	Data Harmonization for Heterogeneous Datasets in Big Data - A Conceptual Model
title_full_unstemmed	Data Harmonization for Heterogeneous Datasets in Big Data - A Conceptual Model
title_sort	data harmonization for heterogeneous datasets in big data - a conceptual model
publisher	Springer Science and Business Media Deutschland GmbH
publishDate	2020
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85098178362&doi=10.1007%2f978-3-030-63322-6_61&partnerID=40&md5=b54cf25cfd96e3825e192f3b37d975b9 http://eprints.utp.edu.my/24643/
_version_	1738656618961174528

Data Harmonization for Heterogeneous Datasets in Big Data - A Conceptual Model

Similar Items