Adversarial robustness of deep reinforcement learning

Over the past decades, the advancements in deep reinforcement learning (DRL) have demonstrated that deep neural network (DNN) policies can be trained to prescribe near-optimal actions in many complex tasks. Unfortunately, DNN policies are shown to be vulnerable to adversarial perturbations in the in...

Full description

Saved in:
Bibliographic Details
Main Author: Qu, Xinghua
Other Authors: Ong Yew Soon
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/154587
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Over the past decades, the advancements in deep reinforcement learning (DRL) have demonstrated that deep neural network (DNN) policies can be trained to prescribe near-optimal actions in many complex tasks. Unfortunately, DNN policies are shown to be vulnerable to adversarial perturbations in the input states, which creates obstacles for the real-world deployments of RL agents, especially on those security-sensitive tasks. Different adversarial attacks to understand the vulnerability and corresponding defense approaches to resist against attacks have been proposed. Although some achievements and interesting findings have been observed, existing adversarial attacks are deemed to be less realistic due to the extensive assumptions utilized, such as white-box policy access and full-state adversary setting. Moreover, existing adversarial defense approaches are largely built on the adversarial training, an adversary dependent defense that is deemed to be less realistic in the wild. Given the research gaps, investigating more realistic adversarial robustness evaluation procedures (i.e., through adversarial attacks) and accordingly designing robust policies have been becoming significant but under developed topics in DRL. In this dissertation, firstly, we propose a minimalistic attack in Chapter 3 by taking a more restrictive view towards adversary generation - with the goal of unveiling the limits of a DRL model's vulnerability. To this end, we define three key settings: (1) black-box policy access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; (2) fractional-state adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and (3) tactically-chanced attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: (i) all policies showcase significant performance degradation by merely modifying 0.01% of the input state, and (ii) the policy trained by DQN is totally deceived by perturbing only 1% frames. Secondly, considering the computational complexity of the minimalistic attacks (in Chapter 3) caused by treating every frame in isolation, Chapter 4 showcases the first study of how transferability across frames could be exploited for boosting the creation of {minimal} yet powerful attacks in image-based RL. In doing so, we introduce three types of frame-correlation transfers (i.e., anterior case transfer, random projection based transfer, and principal components based transfer) with varying degrees of computational complexity in generating adversaries via a genetic algorithm. We empirically demonstrate the trade-off between the complexity and potency of the transfer mechanism by exploring four fully trained state-of-the-art policies on six Atari games. Our frame-correlation transfers dramatically speed up the attack generation compared to existing methods, often significantly reducing the required computation time; thus, shedding light on the real threat of real-time attacks in RL. Last but not the least, to alleviate the vulnerability issue of DRL, in Chapter \ref{chapter:A2PD}, we propose an adversary agnostic defense approach in order to increase the robustness of existed DRL policies. Particularly, to increase the robustness of DRL policies, previous approaches assume that the knowledge of adversaries can be added into the training process to achieve the corresponding generalization ability on these perturbed observations. However, such an assumption not only makes the robustness improvement more expensive, but may also leave a model less effective to other kinds of attacks in the wild. In contrast, we propose an adversary agnostic robust DRL paradigm that does not require learning from adversaries. To this end, we first theoretically derive that robustness could indeed be achieved independently of the adversaries based on a policy distillation setting. Motivated by this finding, we propose a new policy distillation loss with two terms: 1) a prescription gap maximization loss aiming at simultaneously maximizing the likelihood of the action selected by the teacher policy and the entropy over the remaining actions; 2) a corresponding Jacobian regularization loss that minimizes the magnitude of gradient with respect to the input state. The theoretical analysis shows that our distillation loss guarantees to increase the prescription gap and the adversarial robustness. Furthermore, experiments on five Atari games firmly verify the superiority of our approach in terms of boosting adversarial robustness compared to other state-of-the-art methods. Most importantly, we hope this dissertation will provide a useful starting point for both evaluation and improvement of adversarial robustness in DRL.