An efficient approach for data-duplication detection based on RDBMS

Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can dec...

全面介紹

Saved in:
書目詳細資料
Main Authors: Chanhom K., Natwichai J.
格式: Conference or Workshop Item
語言:English
出版: 2014
在線閱讀:http://www.scopus.com/inward/record.url?eid=2-s2.0-79960398890&partnerID=40&md5=f4fa768c0206dcfec0556f50648b98af
http://cmuir.cmu.ac.th/handle/6653943832/1559
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Chiang Mai University
語言: English
實物特徵
總結:Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can decrease the quality of service of information systems. In this paper, we propose an efficient approach to detect the duplication based on the RDBMS foundation. Our approach is based on the assumption that the data to be processed have been stored in the RDBMS at the first place. Thus, the proposed approach does not require the data to be imported/exported from the storage. Also, such approach will benefit from the query optimizer of the RDBMS. The experiment results on the TPC-H dataset have been presented to validate such proposed work. © 2011 IEEE.