Differentially private deep learning for time series data

Machine learning applications based on neural networks are becoming increasingly widespread. This condition gives rise to questions regarding the data subjects' privacy, as some of the data sets used may contain sensitive information. To address this, a new concept of privacy was formalized, na...

Full description

Saved in:
Bibliographic Details
Main Author: Dwitami, Inggriany
Other Authors: Wang Huaxiong
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/144846
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Machine learning applications based on neural networks are becoming increasingly widespread. This condition gives rise to questions regarding the data subjects' privacy, as some of the data sets used may contain sensitive information. To address this, a new concept of privacy was formalized, namely Differential Privacy, and it gave rise to various implementations to satisfy these privacy criterion in the context of machine learning, with Differentially Private Stochastic Gradient Descent (DP-SGD) being one of the most prominent. It fulfills the privacy criterion by adding noise and clipping gradients in addition to the usual SGD algorithm. For this experimental purposes, time series data is chosen over its image counterpart due to the fact that it is less computationally heavy. The UCR archive and the Medical Information Mart for Intensive Care (MIMIC-III) database are some examples of publicly available data sets which contain time series data which may be formulated as a time series classification problem. The UCR archive encompasses a wide variety of subjects while the MIMIC-III database focuses on Electronic Health Records (EHRs). For the latter, the concept of privacy is critical to protect the patients' privacy, hence making it suitable for applying differentially-private training. In this paper, experiments were conducted on the UCR archive and MIMIC-III database to evaluate the effects of using DP-SGD on models' performance, focusing particularly to Long Short Term Memory (LSTM) and Fully Convolutional Neural Network (FCN). The result shows that in general, models trained without the differentially private optimizer tend to outperform those with, which is expected as data utility is traded off for privacy. However, the difference in performance is sometimes small, and even not significant. Furthermore, the added noise in DP-SGD can also act as regularizer to prevent overfitting. This paper recommends that future work must be done to further generalize the result of this experiments. This includes providing a publicly available benchmark data sets, incorporating more models, and comparing various differentially private frameworks.