Manipulating adaptive processes by constructive training-time attacks

Over the last decade, Reinforcement Learning (RL) has dramatically altered the sequential decision-making landscape and has produced several AI breakthroughs such as AlphaGo, Pluribus, and AlphaStar which are RL-based Artificial Intelligence (AI) that have achieved superhuman proficiency in challeng...

Full description

Saved in:
Bibliographic Details
Main Author: Bector, Ridhima
Other Authors: Zinovi Rabinovich
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173190
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Over the last decade, Reinforcement Learning (RL) has dramatically altered the sequential decision-making landscape and has produced several AI breakthroughs such as AlphaGo, Pluribus, and AlphaStar which are RL-based Artificial Intelligence (AI) that have achieved superhuman proficiency in challenging domains of Go, Poker, and StarCraft II. However, the recent application of RL solutions in safety-critical industries such as construction, aviation, and autonomous driving has necessitated research on the safety and robustness of the developed RL technologies. The focus of the current research is to design, develop, execute, and analyse attacks on RL agents (victims) to study weaknesses of RL technologies and thereby enable the development of safe and robust RL solutions in the future. Attacks in RL can be classified as destructive or constructive based on the intent of the attack. Destructive attacks aim to degrade a policy’s performance while constructive attacks strive to force the victim RL agent to learn a target policy that it will not learn by itself in the absence of the attack. Most insidious of constructive attacks are training-time attacks that “pre-program” back-doors and behavioural triggers into a victim RL agent's strategy while the victim agent trains to learn its task. These behavioural triggers can then be used during deployment to make the victim RL agent carry out the target behaviour. Two critical aspects that have not yet been investigated under constructive training-time attacks are attacks on RL agent collectives; and attacks that force adoption of behaviours that are non-optimal with respect to the victim RL agent's objectives (non-optimal target behaviours). Studying RL agent collectives is crucial as industry-level safety-critical systems are composed of multiple decision-making entities that simultaneously learn to carry out the same/different tasks. On the other hand, studying attacks that push RL agents to adopt non-optimal target behaviours is also imperative as it demonstrates the extent to which RL agents can be manipulated and hence highlights the severity of RL's safety problem. This thesis therefore investigates these two novel research directions within the constructive training-time attack paradigm which are attacks on RL agent collectives; and attacks that force adoption of behaviours that are non-optimal with respect to the victim RL agent's objectives. In this research, each constructive training-time attack is designed as a sequence of environment (re)parameterisations (poisonings) that pushes the RL agent (collective or individual) toward the target behaviour while minimising effective environment modulation. The first research direction undertaken as part of this thesis investigates attacks on populations of reinforcement learning agents. Herein, constructive training-time attack on agent collectives is developed to overcome individual differences between member agents and lead the entire population to the same target behaviour. Additionally, the attack is made agnostic to the size of the victim population. This ability enables transferability of the learned attack to different victim collectives of varying sizes during deployment. The developed method is demonstrated on populations of independent learners in “ghost” environments where learners do not interact or perceive each other as well as in shared environments where social learners are mutually aware of each other and learn with or without individual learning. From the attack perspective, a highly practical ultra black-box setting is introduced and pursued. Herein the attacker learns with the aid of each victim's interaction trace across multiple policies (across-policy interaction trace). These across-policy traces are used for both attack conditioning and evaluation. The resulting uncertainty in population behaviour is managed via a novel Wasserstein distance-based Gaussian embedding of behaviours detected within the victim population. The experiments show: a) feasibility, i.e., despite the uncertainty, the attack forces a population-wide adoption of target behaviour; b) efficacy, i.e., the attack is size-agnostic and transferable. The second research direction undertaken as part of this thesis investigates extreme attacks on single reinforcement learning agents. Herein, constructive training-time attack that forces adoption of non-optimal target behaviour is learned using a novel reinforcement-learning algorithm called gamma Deep Deterministic Policy Gradient (gammaDDPG). gammaDDPG is designed as a dual-priority dual-objective RL optimisation algorithm as the attacker must maximise attack accuracy while minimising dispensed effort. As a certain level of effort is imperative in pushing a learning RL agent toward non-optimal behaviours, this work regards maximising attack accuracy as the higher-priority objective and minimising attacker effort as the lower-priority objective. The reward and the reward discounting factor are used to encode and prioritise the attack objectives in gammaDDPG. The reward discounting factor is used to dynamically alter the attack policy planning horizon based on the victim’s current behaviour. This improves effort distribution across the attack timeline and reduces the effect of uncertainty in the given partially observable environment (black-box setting). Experiments show that gammaDDPG can efficiently push a given learning RL agent to adopt a non-optimal target behaviour. The attack methodologies developed as part of the first and second research directions are sensitive to the size and nature of the victim agent's environment. Therefore, the third and last work of this thesis extends the developed attack methodologies to large discrete/continuous victim environments through the development of a latent victim state-space model as well as macro attack actions. The latent victim state-space model can be trained to map any given large discrete/continuous victim environment to a small discrete space while macro attack actions reduce the size of the attacker's action space. Experiments show that in large discrete victim environments implementation of the latent victim state-space model and macro attack actions improve the efficiency of the attacker without degrading its performance. This thesis is organised into seven chapters. The first chapter introduces the thesis by presenting the background, motivation, problem statements, scope, objectives, contributions, applications and organisation of this research. The second chapter discusses the related literature while the third chapter describes the fundamental concepts and technologies utilised to develop the methodologies of this research. Chapters four and five investigate the first and second directions of this research while chapter six presents mechanisms to scale the developed methodologies to larger victim environments. Lastly, chapter seven concludes this thesis by discussing the contributions, limitations as well as future directions of the conducted research.