CollaborEM: A self-supervised entity matching framework using multi-features collaboration

Entity Matching (EM) aims to identify whether two tuples refer to the same real-world entity and is well-known to be labor-intensive. It is a prerequisite to anomaly detection, as comparing the attribute values of two matched tuples from two different datasets provides one effective way to detect an...

Full description

Saved in:

Bibliographic Details
Main Authors:	GE, Congcong, WANG, Pengfei, CHEN, Lu, LIU, Xiaoze, ZHENG, Baihua, GAO, Yunjun
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Entity matching sentence feature graph feature self-supervised anomaly detection Artificial Intelligence and Robotics Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/8341
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9344
record_format	dspace
spelling	sg-smu-ink.sis_research-93442023-12-05T02:12:03Z CollaborEM: A self-supervised entity matching framework using multi-features collaboration GE, Congcong WANG, Pengfei CHEN, Lu LIU, Xiaoze ZHENG, Baihua GAO, Yunjun Entity Matching (EM) aims to identify whether two tuples refer to the same real-world entity and is well-known to be labor-intensive. It is a prerequisite to anomaly detection, as comparing the attribute values of two matched tuples from two different datasets provides one effective way to detect anomalies. Existing EM approaches, due to insufficient feature discovery or error-prone inherent characteristics, are not able to achieve stable performance. In this paper, we present CollaborEM, a self-supervised entity matching framework via multi-features collaboration. It is capable of (i) obtaining reliable EM results with zero human annotations and (ii) discovering adequate tuples’ features in a fault-tolerant manner. CollaborEM consists of two phases, i.e., automatic label generation (ALG) and collaborative EM training (CEMT). In the first phase, ALG is proposed to generate a set of positive tuple pairs and a set of negative tuple pairs. ALG guarantees the high quality of the generated tuples, and hence ensures the training quality of the subsequent CEMT. In the second phase, CEMT is introduced to learn the matching signals by discovering graph features and sentence features of tuples collaboratively. Extensive experimental results over eight real-world EM benchmarks show that CollaborEM outperforms all the existing unsupervised EM approaches and is comparable or even superior to the state-of-the-art supervised EM methods. 2023-12-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/8341 info:doi/10.1109/TKDE.2021.3134806 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Entity matching sentence feature graph feature self-supervised anomaly detection Artificial Intelligence and Robotics Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Entity matching sentence feature graph feature self-supervised anomaly detection Artificial Intelligence and Robotics Databases and Information Systems
spellingShingle	Entity matching sentence feature graph feature self-supervised anomaly detection Artificial Intelligence and Robotics Databases and Information Systems GE, Congcong WANG, Pengfei CHEN, Lu LIU, Xiaoze ZHENG, Baihua GAO, Yunjun CollaborEM: A self-supervised entity matching framework using multi-features collaboration
description	Entity Matching (EM) aims to identify whether two tuples refer to the same real-world entity and is well-known to be labor-intensive. It is a prerequisite to anomaly detection, as comparing the attribute values of two matched tuples from two different datasets provides one effective way to detect anomalies. Existing EM approaches, due to insufficient feature discovery or error-prone inherent characteristics, are not able to achieve stable performance. In this paper, we present CollaborEM, a self-supervised entity matching framework via multi-features collaboration. It is capable of (i) obtaining reliable EM results with zero human annotations and (ii) discovering adequate tuples’ features in a fault-tolerant manner. CollaborEM consists of two phases, i.e., automatic label generation (ALG) and collaborative EM training (CEMT). In the first phase, ALG is proposed to generate a set of positive tuple pairs and a set of negative tuple pairs. ALG guarantees the high quality of the generated tuples, and hence ensures the training quality of the subsequent CEMT. In the second phase, CEMT is introduced to learn the matching signals by discovering graph features and sentence features of tuples collaboratively. Extensive experimental results over eight real-world EM benchmarks show that CollaborEM outperforms all the existing unsupervised EM approaches and is comparable or even superior to the state-of-the-art supervised EM methods.
format	text
author	GE, Congcong WANG, Pengfei CHEN, Lu LIU, Xiaoze ZHENG, Baihua GAO, Yunjun
author_facet	GE, Congcong WANG, Pengfei CHEN, Lu LIU, Xiaoze ZHENG, Baihua GAO, Yunjun
author_sort	GE, Congcong
title	CollaborEM: A self-supervised entity matching framework using multi-features collaboration
title_short	CollaborEM: A self-supervised entity matching framework using multi-features collaboration
title_full	CollaborEM: A self-supervised entity matching framework using multi-features collaboration
title_fullStr	CollaborEM: A self-supervised entity matching framework using multi-features collaboration
title_full_unstemmed	CollaborEM: A self-supervised entity matching framework using multi-features collaboration
title_sort	collaborem: a self-supervised entity matching framework using multi-features collaboration
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/8341
_version_	1784855640900894720

CollaborEM: A self-supervised entity matching framework using multi-features collaboration

Similar Items