Environment poisoning in reinforcement learning: attacks and resilience

As Reinforcement Learning (RL) systems have been widely adopted in real-world applications, the security of these systems has become increasingly significant. Thus, it is essential to protect RL systems against a variety of adversarial attacks. Of these attacks, a training-time attack is considered...

Full description

Saved in:
Bibliographic Details
Main Author: Xu, Hang
Other Authors: Zinovi Rabinovich
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164969
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:As Reinforcement Learning (RL) systems have been widely adopted in real-world applications, the security of these systems has become increasingly significant. Thus, it is essential to protect RL systems against a variety of adversarial attacks. Of these attacks, a training-time attack is considered to be particularly insidious as it poisons an RL policy itself. Training-time attacks attempt to force an RL agent to learn an attacker-desired policy by manipulating the agent's interaction information, such as reward signals and environmental responses. However, to achieve success with training-time attacks, it is generally considered that significant access to the RL system is necessary, such as the ability to distort the RL agent's reward values or experience memory. Also, it is assumed that the attacker possesses a comprehensive knowledge of the RL agent's learning algorithms, policy models, and/or its environment models. The assumption of such an omnipotent attacker makes a training-time attack unrealistic, and thus poses limited threat to an RL system in the real world. In contrast, this thesis studies training-time attacks with limited prior knowledge of RL systems, assuming only the ability to alter the training environment hyper-parameters (i.e., causal factors in physical systems) which are most likely to be accessed by third parties. In this thesis, we investigate the security threats posed by training environment hyper-parameters and present a solution to address their adverse impact on RL policies. This thesis consists of three parts, including environment poisoning attacks (EPA) with full or limited prior knowledge, as well as a policy-resilience mechanism against the proposed attacks. (1) White-Box Environment Poisoning Attack: We propose a transferable environment poisoning attack (TEPA) against RL at training time, assuming that the attacker has full knowledge of an RL agent's learning mechanism (i.e., policy training algorithm and policy model structure/parameters) and its environment model (i.e., dynamics functions). We formulate the attack framework as a bi-level Markov Decision Process, seeking adaptive and minimal environment changes that will prompt the momentary policy of the RL agent to change in an attacker-desired manner. Additionally, we demonstrate the transferability of our attack strategy. Specifically, an attack strategy, which is learned on a proxy agent, can be transferred to poison other victim agents' policies in the same tasks even if these victims adopt different learning algorithms and policy models. When compared to the existing reward-poisoning attacks, experimental results show that TEPA succeeds in efficiently forcing an RL victim to learn an attacker-desired policy via minimized changes to the training environment. Additionally, TEPA is empirically effective in poisoning a black-box RL agent as well as a population of RL agents due to its transferability property. (2) Double-Black-Box Environment Poisoning Attack: We propose a Double-Black-Box Environment Poisoning Attack (DBB-EPA) which requires minimal prior knowledge of the RL system. DBB-EPA assumes only the capability to alter the environment hyper-parameters and seeks to achieve policy compulsion on a black-box RL agent in a black-box training environment. To this end, we first investigate how to infer the internal information of an RL system and then learn an adaptive attack strategy based on the approximation of our attack objective. Empirical studies have demonstrated that DBB-EPA is effective against both tabular-RL and deep-RL agents in discrete as well as continuous state domains. We conclude that DBB-EPA poses threats more realistically to complex RL systems than TEPA which is restricted to white-box discrete environments. (3) Policy Resilience to Environment Poisoning Attack: To address the adverse effects of environment-poisoning attacks (i.e., TEPA and DBB-EPA) on an RL agent's policy learning, we propose a policy-resilience mechanism that attempts to recover poisoned policies in order to achieve optimal deployment performances. The policy-resilience mechanism is designed as a federated architecture in conjunction with a meta-learning manner, allowing efficient extraction and sharing of critical environment-structure knowledge. By leveraging the shared environment knowledge, agents can quickly grasp the dynamic features of deployment environments and accordingly generate imagined trajectories for recovering their poisoned policies. Such a policy-resilience procedure is summarized as three stages, namely, preparation, diagnosis and recovery. Empirical results show that the policy-resilience mechanism against TEPA and DBB-EPA is effective and efficient at recovering poisoned policies based on shared knowledge of the environment. In summary, this thesis studies training environment poisoning attacks against RL in white-box and black-box settings, respectively. Such adversarial attacks reveal the vulnerability of RL to maliciously manipulated training environments. To protect the performance of RL policies, this thesis further develops a policy-resilience mechanism against the proposed attacks. Hopefully, this thesis serves as a starting point for investigating the security threats posed by training environments to RL policies, which could pave the way for the development of secure RL-based applications.