Robust bipoly-matching for multi-granular entities

Entity matching across two data sources is a prevalent need in many domains, including e-commerce. Of interest is the scenario where entities have varying granularity, e.g., a coarse product category may match multiple finer categories. Previous work in one-to-many matching generally presumes the `o...

Full description

Saved in:
Bibliographic Details
Main Authors: LEE, Ween Jiann, TKACHENKO, Maksim, LAUW, Hady W.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6434
https://ink.library.smu.edu.sg/context/sis_research/article/7437/viewcontent/icdm21.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Entity matching across two data sources is a prevalent need in many domains, including e-commerce. Of interest is the scenario where entities have varying granularity, e.g., a coarse product category may match multiple finer categories. Previous work in one-to-many matching generally presumes the `one' necessarily comes from a designated source and the `many' from the other source. In contrast, we propose a novel formulation that allows concurrent one-to-many bidirectional matching in any direction. Beyond flexibility, we also seek matching that is more robust to noisy similarity values arising from diverse entity descriptions, by introducing receptivity and reclusivity notions. In addition to an optimal formulation, we also propose an efficient and performant heuristic. Experiments on multiple real-life datasets from e-commerce sources showcase the effectiveness and outperformance of our proposed algorithms over baselines.