Privacy preservation for linear and learning based inference systems

In the activities of data sharing and decentralized processing, data belonging to a user need to be transmitted to a third-party data processor or aggregator. However, the third-party unit may be curious about some private information hidden in the unprocessed data sent by the user, which poses trem...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Chong Xiao
Other Authors: Tay Wee Peng
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/153057
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In the activities of data sharing and decentralized processing, data belonging to a user need to be transmitted to a third-party data processor or aggregator. However, the third-party unit may be curious about some private information hidden in the unprocessed data sent by the user, which poses tremendous threat to the data owner's privacy. Therefore, it becomes imperative for the data owner to sanitize its raw data to limit the amount of private information before disclosing it. In the meantime, the data owner hopes to maximally preserve the utility of the sanitized data in terms of a legitimate task that she wishes to perform on the third-party unit. In this thesis, we investigate the problem of protecting parameter estimation in a decentralized linear system and a linear dynamical system with known system models. Apart from that, we explore data-driven approaches to protect the private information carried by raw data that serve as an input to a learning-based inference system. In the first place, we consider a multi-agent system where each agent in a network makes a local observation that is linearly related to a set of public and private parameters. The agents send their observations to a fusion center to allow it to estimate the public parameters. To prevent leakage of the private parameters, each agent first sanitizes its local observation using a local privacy mechanism before transmitting it to the fusion center. We study the utility-privacy tradeoff in terms of the Cram{\'e}r-Rao lower bounds for estimating the public and private parameters, and compare the class of privacy mechanisms given by linear compression and noise perturbation. We derive necessary and sufficient conditions for achieving arbitrarily strong privacy without compromising utility, and provide a method to maximize privacy while keeping utility intact. Finally, we propose an algorithm to optimize the utility-privacy tradeoff for satisfying a maximum allowable privacy leakage. As a further development of the first case, we consider a linear dynamical system in which the state vector is composed of public and private parts and progresses according to a known transition model. Multiple sensors make a series of measurements of the state vector and send the data to a fusion center who is authorized to perform state estimation for the public states. To prevent the fusion center from estimating the private states with good accuracy, each sensor's measurements at each time step are linearly compressed into a lower dimensional space before being sent to the fusion center. We take account of the fact that the public states at one time step may provide statistical information about the private states in a future time step, and propose a formulation that can ensure the same level of privacy in all future time steps. We develop online algorithms to find the optimal compression matrix for the utility-privacy tradeoff. Finally, we consider a practical setup where a precise system model cannot be derived analytically due to the complexity of the underlying data. Thus, we need to learn from data to capture the essence of reality. Data are used by service providers as input to inference systems to perform decision making for authorized tasks. The raw data however allows a service provider to infer other sensitive information it has not been authorized for. We explore data driven approaches to sanitize data so as to prevent leakage of sensitive information that is present in the raw data. We deploy mutual information and maximal correlation as privacy leakage measures and reveal that their practical implementation requires different configurations. We present empirical estimators of the privacy metrics, for which asymptotic analysis is carried out. We propose to regularize the domain of the sanitized data to ensure that the sanitized data is still compatible with the service provider's legacy inference system. We develop a deep learning model as an example of the proposed privacy framework.