An efficient approach for data-duplication detection based on RDBMS

Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can dec...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chanhom K., Natwichai J.
Format:	Conference or Workshop Item
Language:	English
Published:	2014
Online Access:	http://www.scopus.com/inward/record.url?eid=2-s2.0-79960398890&partnerID=40&md5=f4fa768c0206dcfec0556f50648b98af http://cmuir.cmu.ac.th/handle/6653943832/1559
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Chiang Mai University
Language:	English

id	th-cmuir.6653943832-1559
record_format	dspace
spelling	th-cmuir.6653943832-15592014-08-29T09:29:27Z An efficient approach for data-duplication detection based on RDBMS Chanhom K. Natwichai J. Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can decrease the quality of service of information systems. In this paper, we propose an efficient approach to detect the duplication based on the RDBMS foundation. Our approach is based on the assumption that the data to be processed have been stored in the RDBMS at the first place. Thus, the proposed approach does not require the data to be imported/exported from the storage. Also, such approach will benefit from the query optimizer of the RDBMS. The experiment results on the TPC-H dataset have been presented to validate such proposed work. © 2011 IEEE. 2014-08-29T09:29:27Z 2014-08-29T09:29:27Z 2011 Conference Paper 9.78146E+12 10.1109/JCSSE.2011.5930142 85529 http://www.scopus.com/inward/record.url?eid=2-s2.0-79960398890&partnerID=40&md5=f4fa768c0206dcfec0556f50648b98af http://cmuir.cmu.ac.th/handle/6653943832/1559 English
institution	Chiang Mai University
building	Chiang Mai University Library
country	Thailand
collection	CMU Intellectual Repository
language	English
description	Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can decrease the quality of service of information systems. In this paper, we propose an efficient approach to detect the duplication based on the RDBMS foundation. Our approach is based on the assumption that the data to be processed have been stored in the RDBMS at the first place. Thus, the proposed approach does not require the data to be imported/exported from the storage. Also, such approach will benefit from the query optimizer of the RDBMS. The experiment results on the TPC-H dataset have been presented to validate such proposed work. © 2011 IEEE.
format	Conference or Workshop Item
author	Chanhom K. Natwichai J.
spellingShingle	Chanhom K. Natwichai J. An efficient approach for data-duplication detection based on RDBMS
author_facet	Chanhom K. Natwichai J.
author_sort	Chanhom K.
title	An efficient approach for data-duplication detection based on RDBMS
title_short	An efficient approach for data-duplication detection based on RDBMS
title_full	An efficient approach for data-duplication detection based on RDBMS
title_fullStr	An efficient approach for data-duplication detection based on RDBMS
title_full_unstemmed	An efficient approach for data-duplication detection based on RDBMS
title_sort	efficient approach for data-duplication detection based on rdbms
publishDate	2014
url	http://www.scopus.com/inward/record.url?eid=2-s2.0-79960398890&partnerID=40&md5=f4fa768c0206dcfec0556f50648b98af http://cmuir.cmu.ac.th/handle/6653943832/1559
_version_	1681419693070483456

An efficient approach for data-duplication detection based on RDBMS

Similar Items