CampER: An effective framework for privacy-aware deep entity resolution

Entity Resolution (ER) is a fundamental problem in data preparation. Standard deep ER methods have achieved state-of-the-art efectiveness, assuming that relations from diferent organizations are centrally stored. However, due to privacy concerns, it can be difcult to centralize data in practice, ren...

Full description

Saved in:
Bibliographic Details
Main Authors: GUO, Yuxiang, CHEN, Lu, ZHOU, Zhengjie, ZHENG, Baihua, FANG, Ziquan, ZHANG, Zhikun, MAO, Yuren, GAO, Yunjun
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8106
https://ink.library.smu.edu.sg/context/sis_research/article/9109/viewcontent/3580305.3599266.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9109
record_format dspace
spelling sg-smu-ink.sis_research-91092024-02-16T02:54:26Z CampER: An effective framework for privacy-aware deep entity resolution GUO, Yuxiang CHEN, Lu ZHOU, Zhengjie ZHENG, Baihua FANG, Ziquan ZHANG, Zhikun MAO, Yuren GAO, Yunjun Entity Resolution (ER) is a fundamental problem in data preparation. Standard deep ER methods have achieved state-of-the-art efectiveness, assuming that relations from diferent organizations are centrally stored. However, due to privacy concerns, it can be difcult to centralize data in practice, rendering standard deep ER solutions inapplicable. Despite eforts to develop rule-based privacy-preserving ER methods, they often neglect subtle matching mechanisms and have poor efectiveness as a result. To bridge efectiveness and privacy, in this paper, we propose CampER, an efective framework for privacy-aware deep entity resolution. Specifcally, we frst design a training pair self-generation strategy to overcome the absence of manually labeled data in privacy-aware scenarios. Based on the selfconstructed training pairs, we present a collaborative fne-tuning approach to learn the match-aware and uni-space individual tuple embeddings for accurate matching decisions. During the matching decision-making process, we frst introduce a cryptographically secure approach to determine matches. Furthermore, we propose an order-preserving perturbation strategy to signifcantly accelerate the matching computation while guaranteeing the consistency of ER results. Extensive experiments on eight widely-used benchmark datasets demonstrate that CampER not only is comparable with the state-of-the-art standard deep ER solutions in efectiveness, but also preserves privacy. 2023-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8106 info:doi/10.1145/3580305.3599266 https://ink.library.smu.edu.sg/context/sis_research/article/9109/viewcontent/3580305.3599266.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University entity resolution representation learning similarity measurement Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic entity resolution
representation learning
similarity measurement
Databases and Information Systems
spellingShingle entity resolution
representation learning
similarity measurement
Databases and Information Systems
GUO, Yuxiang
CHEN, Lu
ZHOU, Zhengjie
ZHENG, Baihua
FANG, Ziquan
ZHANG, Zhikun
MAO, Yuren
GAO, Yunjun
CampER: An effective framework for privacy-aware deep entity resolution
description Entity Resolution (ER) is a fundamental problem in data preparation. Standard deep ER methods have achieved state-of-the-art efectiveness, assuming that relations from diferent organizations are centrally stored. However, due to privacy concerns, it can be difcult to centralize data in practice, rendering standard deep ER solutions inapplicable. Despite eforts to develop rule-based privacy-preserving ER methods, they often neglect subtle matching mechanisms and have poor efectiveness as a result. To bridge efectiveness and privacy, in this paper, we propose CampER, an efective framework for privacy-aware deep entity resolution. Specifcally, we frst design a training pair self-generation strategy to overcome the absence of manually labeled data in privacy-aware scenarios. Based on the selfconstructed training pairs, we present a collaborative fne-tuning approach to learn the match-aware and uni-space individual tuple embeddings for accurate matching decisions. During the matching decision-making process, we frst introduce a cryptographically secure approach to determine matches. Furthermore, we propose an order-preserving perturbation strategy to signifcantly accelerate the matching computation while guaranteeing the consistency of ER results. Extensive experiments on eight widely-used benchmark datasets demonstrate that CampER not only is comparable with the state-of-the-art standard deep ER solutions in efectiveness, but also preserves privacy.
format text
author GUO, Yuxiang
CHEN, Lu
ZHOU, Zhengjie
ZHENG, Baihua
FANG, Ziquan
ZHANG, Zhikun
MAO, Yuren
GAO, Yunjun
author_facet GUO, Yuxiang
CHEN, Lu
ZHOU, Zhengjie
ZHENG, Baihua
FANG, Ziquan
ZHANG, Zhikun
MAO, Yuren
GAO, Yunjun
author_sort GUO, Yuxiang
title CampER: An effective framework for privacy-aware deep entity resolution
title_short CampER: An effective framework for privacy-aware deep entity resolution
title_full CampER: An effective framework for privacy-aware deep entity resolution
title_fullStr CampER: An effective framework for privacy-aware deep entity resolution
title_full_unstemmed CampER: An effective framework for privacy-aware deep entity resolution
title_sort camper: an effective framework for privacy-aware deep entity resolution
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8106
https://ink.library.smu.edu.sg/context/sis_research/article/9109/viewcontent/3580305.3599266.pdf
_version_ 1794549702894551040