Robust bipoly-matching for multi-granular entities
Entity matching across two data sources is a prevalent need in many domains, including e-commerce. Of interest is the scenario where entities have varying granularity, e.g., a coarse product category may match multiple finer categories. Previous work in one-to-many matching generally presumes the `o...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2021
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6434 https://ink.library.smu.edu.sg/context/sis_research/article/7437/viewcontent/icdm21.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-7437 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-74372022-05-10T05:28:32Z Robust bipoly-matching for multi-granular entities LEE, Ween Jiann TKACHENKO, Maksim LAUW, Hady W. Entity matching across two data sources is a prevalent need in many domains, including e-commerce. Of interest is the scenario where entities have varying granularity, e.g., a coarse product category may match multiple finer categories. Previous work in one-to-many matching generally presumes the `one' necessarily comes from a designated source and the `many' from the other source. In contrast, we propose a novel formulation that allows concurrent one-to-many bidirectional matching in any direction. Beyond flexibility, we also seek matching that is more robust to noisy similarity values arising from diverse entity descriptions, by introducing receptivity and reclusivity notions. In addition to an optimal formulation, we also propose an efficient and performant heuristic. Experiments on multiple real-life datasets from e-commerce sources showcase the effectiveness and outperformance of our proposed algorithms over baselines. 2021-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6434 info:doi/10.1109/ICDM51629.2021.00143 https://ink.library.smu.edu.sg/context/sis_research/article/7437/viewcontent/icdm21.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University entity resolution matching one-to-many poly bipoly Databases and Information Systems Data Science |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
entity resolution matching one-to-many poly bipoly Databases and Information Systems Data Science |
spellingShingle |
entity resolution matching one-to-many poly bipoly Databases and Information Systems Data Science LEE, Ween Jiann TKACHENKO, Maksim LAUW, Hady W. Robust bipoly-matching for multi-granular entities |
description |
Entity matching across two data sources is a prevalent need in many domains, including e-commerce. Of interest is the scenario where entities have varying granularity, e.g., a coarse product category may match multiple finer categories. Previous work in one-to-many matching generally presumes the `one' necessarily comes from a designated source and the `many' from the other source. In contrast, we propose a novel formulation that allows concurrent one-to-many bidirectional matching in any direction. Beyond flexibility, we also seek matching that is more robust to noisy similarity values arising from diverse entity descriptions, by introducing receptivity and reclusivity notions. In addition to an optimal formulation, we also propose an efficient and performant heuristic. Experiments on multiple real-life datasets from e-commerce sources showcase the effectiveness and outperformance of our proposed algorithms over baselines. |
format |
text |
author |
LEE, Ween Jiann TKACHENKO, Maksim LAUW, Hady W. |
author_facet |
LEE, Ween Jiann TKACHENKO, Maksim LAUW, Hady W. |
author_sort |
LEE, Ween Jiann |
title |
Robust bipoly-matching for multi-granular entities |
title_short |
Robust bipoly-matching for multi-granular entities |
title_full |
Robust bipoly-matching for multi-granular entities |
title_fullStr |
Robust bipoly-matching for multi-granular entities |
title_full_unstemmed |
Robust bipoly-matching for multi-granular entities |
title_sort |
robust bipoly-matching for multi-granular entities |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2021 |
url |
https://ink.library.smu.edu.sg/sis_research/6434 https://ink.library.smu.edu.sg/context/sis_research/article/7437/viewcontent/icdm21.pdf |
_version_ |
1770575959800414208 |