Provable de-anonymization of large datasets with sparse dimensions

There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-dataabout individuals, e.g., their preferences, movie ratings, or transactiondata. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of...

Full description

Saved in:

Bibliographic Details
Main Authors:	DATTA, Anupam, SHARMA, Divya, SINHA, Arunesh
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2012
Subjects:	Privacy database de-anonymization Artificial Intelligence and Robotics Computer Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/4471 https://ink.library.smu.edu.sg/context/sis_research/article/5474/viewcontent/dss_post12_1_.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5474
record_format	dspace
spelling	sg-smu-ink.sis_research-54742019-12-05T06:34:15Z Provable de-anonymization of large datasets with sparse dimensions DATTA, Anupam SHARMA, Divya SINHA, Arunesh There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-dataabout individuals, e.g., their preferences, movie ratings, or transactiondata. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm thatwas used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of thedatabase and the auxiliary information available to the adversary thatenable two classes of privacy attacks. In the first attack, the adversarysuccessfully identifies the individual about whom she possesses auxiliaryinformation (an isolation attack). In the second attack, the adversarylearns additional information about the individual, although she may notbe able to uniquely identify him (an information amplification attack ).We demonstrate the applicability of the analytical results by empiricallyverifying that the mathematical properties assumed of the database areactually true for a significant fraction of the records in the Netflix movieratings database, which contains ratings from about 500,000 users. 2012-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4471 info:doi/10.1007/978-3-642-28641-4_13 https://ink.library.smu.edu.sg/context/sis_research/article/5474/viewcontent/dss_post12_1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Privacy database de-anonymization Artificial Intelligence and Robotics Computer Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Privacy database de-anonymization Artificial Intelligence and Robotics Computer Engineering
spellingShingle	Privacy database de-anonymization Artificial Intelligence and Robotics Computer Engineering DATTA, Anupam SHARMA, Divya SINHA, Arunesh Provable de-anonymization of large datasets with sparse dimensions
description	There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-dataabout individuals, e.g., their preferences, movie ratings, or transactiondata. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm thatwas used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of thedatabase and the auxiliary information available to the adversary thatenable two classes of privacy attacks. In the first attack, the adversarysuccessfully identifies the individual about whom she possesses auxiliaryinformation (an isolation attack). In the second attack, the adversarylearns additional information about the individual, although she may notbe able to uniquely identify him (an information amplification attack ).We demonstrate the applicability of the analytical results by empiricallyverifying that the mathematical properties assumed of the database areactually true for a significant fraction of the records in the Netflix movieratings database, which contains ratings from about 500,000 users.
format	text
author	DATTA, Anupam SHARMA, Divya SINHA, Arunesh
author_facet	DATTA, Anupam SHARMA, Divya SINHA, Arunesh
author_sort	DATTA, Anupam
title	Provable de-anonymization of large datasets with sparse dimensions
title_short	Provable de-anonymization of large datasets with sparse dimensions
title_full	Provable de-anonymization of large datasets with sparse dimensions
title_fullStr	Provable de-anonymization of large datasets with sparse dimensions
title_full_unstemmed	Provable de-anonymization of large datasets with sparse dimensions
title_sort	provable de-anonymization of large datasets with sparse dimensions
publisher	Institutional Knowledge at Singapore Management University
publishDate	2012
url	https://ink.library.smu.edu.sg/sis_research/4471 https://ink.library.smu.edu.sg/context/sis_research/article/5474/viewcontent/dss_post12_1_.pdf
_version_	1770574849325924352

Provable de-anonymization of large datasets with sparse dimensions

Similar Items