Validating multi-column schema matchings by type

Validation of multi-column schema matchings is essential for successful database integration. This task is especially difficult when the databases to be integrated contain little overlapping data, as is often the case in practice (e.g., customer bases of different companies). Based on the intuition...

Full description

Saved in:

Bibliographic Details
Main Authors:	DAI, Bing Tian, KOUDAS, Nick, SRIVASTAVA, Divesh, TUNG, Anthony K.H., VENKATASUBRAMANIAN, Suresh
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2008
Subjects:	Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/4167 https://ink.library.smu.edu.sg/context/sis_research/article/5170/viewcontent/Multi_column_schema_matchingICDE08.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5170
record_format	dspace
spelling	sg-smu-ink.sis_research-51702018-11-22T02:52:43Z Validating multi-column schema matchings by type DAI, Bing Tian KOUDAS, Nick SRIVASTAVA, Divesh TUNG, Anthony K.H. VENKATASUBRAMANIAN, Suresh Validation of multi-column schema matchings is essential for successful database integration. This task is especially difficult when the databases to be integrated contain little overlapping data, as is often the case in practice (e.g., customer bases of different companies). Based on the intuition that values present in different columns related by a schema matching will have similar "semantic type", and that this can be captured using distributions over values ("statistical types"), we develop a method for validating 1-1 and compositional schema matchings. Our technique is based on three key technical ideas. First, we propose a generic measure for comparing two columns matched by a schema matching, based on a notion of information-theoretic discrepancy that generalizes the standard geometric discrepancy; this provides the basis for 1:1 matching. Second, we present an algorithm for "splitting" the string values in a column to identify substrings that are likely to match with the values in another column; this enables (multi-column) 1:m schema matching. Third, our technique provides an invalidation certificate if it fails to validate a schema matching. We complement our conceptual and algorithmic contributions with an experimental study that demonstrates the effectiveness and efficiency of our technique on a variety of database schemas and data sets. 2008-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4167 info:doi/10.1109/ICDE.2008.4497420 https://ink.library.smu.edu.sg/context/sis_research/article/5170/viewcontent/Multi_column_schema_matchingICDE08.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems
spellingShingle	Databases and Information Systems DAI, Bing Tian KOUDAS, Nick SRIVASTAVA, Divesh TUNG, Anthony K.H. VENKATASUBRAMANIAN, Suresh Validating multi-column schema matchings by type
description	Validation of multi-column schema matchings is essential for successful database integration. This task is especially difficult when the databases to be integrated contain little overlapping data, as is often the case in practice (e.g., customer bases of different companies). Based on the intuition that values present in different columns related by a schema matching will have similar "semantic type", and that this can be captured using distributions over values ("statistical types"), we develop a method for validating 1-1 and compositional schema matchings. Our technique is based on three key technical ideas. First, we propose a generic measure for comparing two columns matched by a schema matching, based on a notion of information-theoretic discrepancy that generalizes the standard geometric discrepancy; this provides the basis for 1:1 matching. Second, we present an algorithm for "splitting" the string values in a column to identify substrings that are likely to match with the values in another column; this enables (multi-column) 1:m schema matching. Third, our technique provides an invalidation certificate if it fails to validate a schema matching. We complement our conceptual and algorithmic contributions with an experimental study that demonstrates the effectiveness and efficiency of our technique on a variety of database schemas and data sets.
format	text
author	DAI, Bing Tian KOUDAS, Nick SRIVASTAVA, Divesh TUNG, Anthony K.H. VENKATASUBRAMANIAN, Suresh
author_facet	DAI, Bing Tian KOUDAS, Nick SRIVASTAVA, Divesh TUNG, Anthony K.H. VENKATASUBRAMANIAN, Suresh
author_sort	DAI, Bing Tian
title	Validating multi-column schema matchings by type
title_short	Validating multi-column schema matchings by type
title_full	Validating multi-column schema matchings by type
title_fullStr	Validating multi-column schema matchings by type
title_full_unstemmed	Validating multi-column schema matchings by type
title_sort	validating multi-column schema matchings by type
publisher	Institutional Knowledge at Singapore Management University
publishDate	2008
url	https://ink.library.smu.edu.sg/sis_research/4167 https://ink.library.smu.edu.sg/context/sis_research/article/5170/viewcontent/Multi_column_schema_matchingICDE08.pdf
_version_	1770574390762668032

Validating multi-column schema matchings by type

Similar Items