Robust AI: security and privacy issues in machine learning

Machine learning based decision making can be adopted in practice as a driver of most applications only when there are strong guarantees on its reliability. The trust of those involved as stakeholders needs to be established for making it more ubiquitous and acceptable. In general, the idea of relia...

Full description

Saved in:
Bibliographic Details
Main Author: Chattopadhyay, Nandish
Other Authors: Anupam Chattopadhyay
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165248
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Machine learning based decision making can be adopted in practice as a driver of most applications only when there are strong guarantees on its reliability. The trust of those involved as stakeholders needs to be established for making it more ubiquitous and acceptable. In general, the idea of reliability in machine learning can be construed as a sum of two parts, Robustness and Resilience. Since reliability is concerned about providing assurances against malfunctions or errors, it can also be classified based on the types of those errors. This thesis deals with Robustness, that is robustness against attacks, which are responsible for generating intentional and malicious errors. The robustness is therefore to be studied in conjunction with the mechanisms an adversary may adopt to disrupt the machine learning application. To investigate and understand Robustness in ML algorithms against various forms of attacks, we break down each into components and consider all possible combinations. Typically, Robustness is studied with respect to security and privacy aspects. The ML algorithm itself comprises of the trained model and the training data. Therefore, we consider security and privacy related problems pertaining to both components of ML, that is the model and the data. It may be noted here that the work touches upon many important problems in this regard, but it is not exhaustive. We consider all problems related to attacks that jeopardize the fundamental ML task itself under the umbrella of security issues and those that deal with attacks leaking secret and sensitive information about the system under privacy. The primary vulnerability of machine learning models with respect to security is adversarial attacks. In the first part, we consider security for models, and study the adversarial attacks through the lens of dimensionality. We assert that the high dimensional landscape in which the neural network models optimize facilitate the generation of adversarial examples and that dimensionality reduction enhances adversarial robustness. We have explored the mathematical background for this proposition, studying the properties of data distributions in the high dimensional spaces and nature of such trained manifolds. The idea has been empirically justified thereafter. We have extended this notion of the influence of dimensionality on adversarial sample generation from images to videos and text and provided practical and efficient solutions by leveraging adversarial samples detection and dimensionality reduction. It is important to note here that reducing the dimensionality has an additional computational cost and can in some cases also have an adverse effect on the fundamental machine learning task itself. We therefore optimise the dimension reduction operation for each of the tasks and use-cases and carefully choose the amount of variability to preserve that effectively eliminates adversarial noise but retains the meaningful information necessary for classification or object detection etc. Additionally, in one of the works, we make use of a parallel channel to run the classifications task and adversarial sample detection and apply dimensionality reduction to only those samples that are detected as adversarial. This significantly improves efficiency of the overall system and proves to be beneficial in terms of accuracy as well. Thereafter, we study the security flaw of adversarial attacks from the perspective of vulnerable features within the data. We have analysed spatially correlated patterns within adversarial images. Class wise, we have split images into two key parts, Region of Importance (RoI) and Region of Attack (RoA), such that the RoI is the region that the classifier is particularly sensitive to, during the classification task, and the RoA is the region which the adversarial attack modifies. The goal of this exercise was to figure out areas within the images that do not contribute to the task of classification but are adversarially vulnerable. Our proposed adversarial defence mechanism out of this work is to neutralize that region, therefore bringing down adversarial vulnerability without compromising on classification accuracy. The idea is demonstrated through benchmarking datasets and models. Moving on to the aspect of privacy in the second part, we look at some very different problems. First, we look at privacy for the models, followed by that of the data. In the context of preserving privacy of the trained neural network models, we direct our attention to protecting ownership and IP rights of the models, using watermarking. We review the state-of-the-art in watermarking schemes for neural networks and select the most appropriate ones to study. Watermarking using backdooring is the scheme of choice here. We investigate the vulnerabilities of this scheme and break it using synthesis. In our proposition titled Re-Markable, we make the assumption that the adversary has very limited compute power and access to samples from the data distribution relevant to the task. We train a GAN (Generative Adversarial Network) to synthesize more samples and use the synthesized samples to re-train just the fully connected layers of the watermarked models. As demonstrated, it turns out that this minimal computation is sufficient to eliminate the presence of embedded watermarks from the model, and this vulnerability makes the existing scheme extremely unreliable. To solve the problem thus discovered, we worked on a robust watermarking scheme that overcomes this vulnerability. ROWBACK, or robust watermarking for neural networks using backdooring, uses a redesigned mechanism for generating the Trigger Set (using adversarial examples with explicit labelling) that is used as the private key for the watermarking, and a method of explicitly marking every layer of the neural network with the embedded watermarks. The goal is to ensure that an adversary interested in extracting the network would need to re-train every layer of the model, which is as good as training a fresh model from scratch as it would require extensive training samples and compute power. We also extended the idea of robust watermarking for models in the domain of natural language processing, particularly text classifiers. TextBack, designed to embed watermarks within text classifiers using backdooring, uses a marking scheme that involves the Trigger samples and clean samples together, unlike that in images, as this property is observed only in sequential models like recurrent neural networks and LSTM based models. Finally, to cover the aspect of privacy of data, we focus on collaborative ML. The motivation in this work is to protect the privacy of the data that is used for training, as many practical applications necessitate the usage of highly sensitive data. This data is decentralised, resides with different non-collocated entities, and is not sharable on privacy grounds. The straight-forward solution to this problem is available in the literature in the form of Federated Learning. Additionally, an adversary may tap into the federated learning infrastructure in multiple ways and extract information. For example, Membership Inference attacks are possible on models deployed in the cloud (which is natural in many federated learning setups), wherein the model could be queried multiple times and based on the output probabilities returned by the model, one can train another separate model to know if the particular input that had been sent as the query belonged to the training set or not. Carrying out this process multiple times can in theory lead to the reverse-engineering of the entire training set. This is a serious privacy violation. We attempted to solve this problem using Differential Privacy, where a differentially private learning algorithm is used by the participants of the federated learning infrastructure for their local training, involving gradient trimming and addition of sampled noise. We studied practical applications of such collaborative learning systems and deployed the framework on edge devices, by creating light-weight versions of the models that do not compromise on accuracy. Similarly, in another use-case, the participants of the federated learning setup itself could have malicious intentions and can come up with sabotaging attacks on the learning framework. Considering such potential single points of failures of the overall system, we proposed a robust federated learning infrastructure, that assigns coefficients to the updates sent by the clients to the server. This takes care of tolerating faults in up to 50% of the clients’ failures. Overall, the ideas discussed in this thesis are a major step towards making machine learning systems more robust and is therefore a necessary step in the direction of reliable AI.