Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings

We focus on the problem of multi-party data sharing in high dimensional data settings where the number of measured features (or the dimension) p is frequently much larger than the number of subjects (or the sample size) n, the so-called p>> n scenario that has been the focus of much recent sta...

Full description

Saved in:
Bibliographic Details
Main Authors: FIENBERG, Stephen E., JIN, Jiashun
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2012
Subjects:
Online Access:https://ink.library.smu.edu.sg/larc/1
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1000&context=larc
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.larc-1000
record_format dspace
spelling sg-smu-ink.larc-10002018-07-09T06:03:32Z Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings FIENBERG, Stephen E. JIN, Jiashun We focus on the problem of multi-party data sharing in high dimensional data settings where the number of measured features (or the dimension) p is frequently much larger than the number of subjects (or the sample size) n, the so-called p>> n scenario that has been the focus of much recent statistical research. Here, we consider data sharing for two interconnected problems in high dimensional data analysis, namely the feature selection and classification. We characterize the notions of “cautious", “regular", and “generous" data sharing in terms of their privacy-preserving implications for the parties and their share of data, with focus on the \feature privacy" rather than the \sample privacy," though the violation of the former may lead to the latter. We evaluate the data sharing methods using a phase diagram from the statistical literature on multiplicity and Higher Criticism thresholding. In the two-dimensional phase space calibrated by the signal sparsity and signal strength, a phase diagram is a partition of the phase space and contains three distinguished regions, where we have no (feature) privacy violation, relatively rare privacy violations, and an overwhelming amount of privacy violation. 2012-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/larc/1 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1000&context=larc http://creativecommons.org/licenses/by-nc-nd/4.0/ LARC Research Publications eng Institutional Knowledge at Singapore Management University Databases and Information Systems Information Security
institution Singapore Management University
building SMU Libraries
country Singapore
collection InK@SMU
language English
topic Databases and Information Systems
Information Security
spellingShingle Databases and Information Systems
Information Security
FIENBERG, Stephen E.
JIN, Jiashun
Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
description We focus on the problem of multi-party data sharing in high dimensional data settings where the number of measured features (or the dimension) p is frequently much larger than the number of subjects (or the sample size) n, the so-called p>> n scenario that has been the focus of much recent statistical research. Here, we consider data sharing for two interconnected problems in high dimensional data analysis, namely the feature selection and classification. We characterize the notions of “cautious", “regular", and “generous" data sharing in terms of their privacy-preserving implications for the parties and their share of data, with focus on the \feature privacy" rather than the \sample privacy," though the violation of the former may lead to the latter. We evaluate the data sharing methods using a phase diagram from the statistical literature on multiplicity and Higher Criticism thresholding. In the two-dimensional phase space calibrated by the signal sparsity and signal strength, a phase diagram is a partition of the phase space and contains three distinguished regions, where we have no (feature) privacy violation, relatively rare privacy violations, and an overwhelming amount of privacy violation.
format text
author FIENBERG, Stephen E.
JIN, Jiashun
author_facet FIENBERG, Stephen E.
JIN, Jiashun
author_sort FIENBERG, Stephen E.
title Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
title_short Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
title_full Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
title_fullStr Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
title_full_unstemmed Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
title_sort privacy-preserving data sharing in high dimensional regression and classification settings
publisher Institutional Knowledge at Singapore Management University
publishDate 2012
url https://ink.library.smu.edu.sg/larc/1
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1000&context=larc
_version_ 1681132862576787456