Differential privacy for survival analysis and user data collection

Most of the personal information nowadays exist in the form of digital data which includes sensitive information such as medical records, credit card information, private instant messages, etc. In this research, we aim to investigate the data privacy problem in collecting and mining user sensitive i...

Full description

Saved in:
Bibliographic Details
Main Author: Nguyen, Thong T.
Other Authors: Hui Siu Cheung
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/85347
http://hdl.handle.net/10220/48212
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-85347
record_format dspace
spelling sg-ntu-dr.10356-853472020-07-01T05:43:00Z Differential privacy for survival analysis and user data collection Nguyen, Thong T. Hui Siu Cheung School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Database management DRNTU::Engineering::Computer science and engineering::Mathematics of computing::Probability and statistics Most of the personal information nowadays exist in the form of digital data which includes sensitive information such as medical records, credit card information, private instant messages, etc. In this research, we aim to investigate the data privacy problem in collecting and mining user sensitive information. We focus our research on: (i) data privacy in survival analysis which uses medical records to learn useful survival models in medical research; and (ii) data privacy in collecting user data which is the current practice of many corporations and governments. We use differential privacy, which is the golden standard in privacy protection, to address the data privacy problem in survival analysis and user data collection. To this end, we aim to achieve the following: • Guaranteeing privacy for survival analysis models which include (i) parametric and nonparametric survival models; and (ii) continuous-time and discrete-time survival regression models. • Guaranteeing privacy for users whose data is collected by corporations and governments. The main contributions of this thesis are given as follows: • For nonparametric survival models, we have proposed a private mechanism for two popular nonparametric estimators, namely Kaplan-Meier estimator and Nelson-Aalen estimator. For parametric survival models, we have proposed a simple private mechanism for accurately estimating the parameter of the exponential distribution. In addition, we have also proposed a private mechanism based on the local sensitivity concept for estimating the parameters of the Weibull distribution. • For estimating uncertainty in parametric survival models, we have proposed a private framework which allows learning the posterior function. Moreover, we have applied the proposed framework to parametric models with Weibull distribution and flexible parametric models. • We have proposed three private approaches for estimating the discrete-time survival regression model, namely extended output perturbation approach, extended objective perturbation approach, and posterior sampling approach. • We have proposed a posterior sampling approach for continuous-time survival regression model. In addition, we have also proposed a posterior perturbing approach which supports a relaxation of differential privacy for scenarios in which differential privacy is impractical. • For user data collection, we have proposed mechanisms which allow each user to publish a randomized vector of categorical data and numerical data. The proposed mechanisms are asymptotically optimal in both accuracy and run-time. Moreover, we have applied the proposed mechanisms to supervised learning problems under the empirical risk minimization framework. Doctor of Philosophy 2019-05-15T08:28:11Z 2019-12-06T16:02:07Z 2019-05-15T08:28:11Z 2019-12-06T16:02:07Z 2019 Thesis Nguyen, T. T. (2019). Differential privacy for survival analysis and user data collection. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/85347 http://hdl.handle.net/10220/48212 10.32657/10220/48212 en 196 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Database management
DRNTU::Engineering::Computer science and engineering::Mathematics of computing::Probability and statistics
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Database management
DRNTU::Engineering::Computer science and engineering::Mathematics of computing::Probability and statistics
Nguyen, Thong T.
Differential privacy for survival analysis and user data collection
description Most of the personal information nowadays exist in the form of digital data which includes sensitive information such as medical records, credit card information, private instant messages, etc. In this research, we aim to investigate the data privacy problem in collecting and mining user sensitive information. We focus our research on: (i) data privacy in survival analysis which uses medical records to learn useful survival models in medical research; and (ii) data privacy in collecting user data which is the current practice of many corporations and governments. We use differential privacy, which is the golden standard in privacy protection, to address the data privacy problem in survival analysis and user data collection. To this end, we aim to achieve the following: • Guaranteeing privacy for survival analysis models which include (i) parametric and nonparametric survival models; and (ii) continuous-time and discrete-time survival regression models. • Guaranteeing privacy for users whose data is collected by corporations and governments. The main contributions of this thesis are given as follows: • For nonparametric survival models, we have proposed a private mechanism for two popular nonparametric estimators, namely Kaplan-Meier estimator and Nelson-Aalen estimator. For parametric survival models, we have proposed a simple private mechanism for accurately estimating the parameter of the exponential distribution. In addition, we have also proposed a private mechanism based on the local sensitivity concept for estimating the parameters of the Weibull distribution. • For estimating uncertainty in parametric survival models, we have proposed a private framework which allows learning the posterior function. Moreover, we have applied the proposed framework to parametric models with Weibull distribution and flexible parametric models. • We have proposed three private approaches for estimating the discrete-time survival regression model, namely extended output perturbation approach, extended objective perturbation approach, and posterior sampling approach. • We have proposed a posterior sampling approach for continuous-time survival regression model. In addition, we have also proposed a posterior perturbing approach which supports a relaxation of differential privacy for scenarios in which differential privacy is impractical. • For user data collection, we have proposed mechanisms which allow each user to publish a randomized vector of categorical data and numerical data. The proposed mechanisms are asymptotically optimal in both accuracy and run-time. Moreover, we have applied the proposed mechanisms to supervised learning problems under the empirical risk minimization framework.
author2 Hui Siu Cheung
author_facet Hui Siu Cheung
Nguyen, Thong T.
format Theses and Dissertations
author Nguyen, Thong T.
author_sort Nguyen, Thong T.
title Differential privacy for survival analysis and user data collection
title_short Differential privacy for survival analysis and user data collection
title_full Differential privacy for survival analysis and user data collection
title_fullStr Differential privacy for survival analysis and user data collection
title_full_unstemmed Differential privacy for survival analysis and user data collection
title_sort differential privacy for survival analysis and user data collection
publishDate 2019
url https://hdl.handle.net/10356/85347
http://hdl.handle.net/10220/48212
_version_ 1681056300366036992