An efficient approach for data-duplication detection based on RDBMS

Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can dec...

Full description

Saved in:
Bibliographic Details
Main Authors: Kiettisak Chanhom, Juggapong Natwichai
Format: Conference Proceeding
Published: 2018
Subjects:
Online Access:https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=79960398890&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/49880
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
id th-cmuir.6653943832-49880
record_format dspace
spelling th-cmuir.6653943832-498802018-09-04T04:19:40Z An efficient approach for data-duplication detection based on RDBMS Kiettisak Chanhom Juggapong Natwichai Computer Science Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can decrease the quality of service of information systems. In this paper, we propose an efficient approach to detect the duplication based on the RDBMS foundation. Our approach is based on the assumption that the data to be processed have been stored in the RDBMS at the first place. Thus, the proposed approach does not require the data to be imported/exported from the storage. Also, such approach will benefit from the query optimizer of the RDBMS. The experiment results on the TPC-H dataset have been presented to validate such proposed work. © 2011 IEEE. 2018-09-04T04:19:40Z 2018-09-04T04:19:40Z 2011-07-21 Conference Proceeding 2-s2.0-79960398890 10.1109/JCSSE.2011.5930142 https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=79960398890&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/49880
institution Chiang Mai University
building Chiang Mai University Library
country Thailand
collection CMU Intellectual Repository
topic Computer Science
spellingShingle Computer Science
Kiettisak Chanhom
Juggapong Natwichai
An efficient approach for data-duplication detection based on RDBMS
description Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can decrease the quality of service of information systems. In this paper, we propose an efficient approach to detect the duplication based on the RDBMS foundation. Our approach is based on the assumption that the data to be processed have been stored in the RDBMS at the first place. Thus, the proposed approach does not require the data to be imported/exported from the storage. Also, such approach will benefit from the query optimizer of the RDBMS. The experiment results on the TPC-H dataset have been presented to validate such proposed work. © 2011 IEEE.
format Conference Proceeding
author Kiettisak Chanhom
Juggapong Natwichai
author_facet Kiettisak Chanhom
Juggapong Natwichai
author_sort Kiettisak Chanhom
title An efficient approach for data-duplication detection based on RDBMS
title_short An efficient approach for data-duplication detection based on RDBMS
title_full An efficient approach for data-duplication detection based on RDBMS
title_fullStr An efficient approach for data-duplication detection based on RDBMS
title_full_unstemmed An efficient approach for data-duplication detection based on RDBMS
title_sort efficient approach for data-duplication detection based on rdbms
publishDate 2018
url https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=79960398890&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/49880
_version_ 1681423489626537984