Minimalistic attacks : how little it takes to fool deep reinforcement learning policies

Recent studies have revealed that neural network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this paper we take a more restrictive view towards adversary...

Full description

Saved in:
Bibliographic Details
Main Authors: Qu, Xinghua, Sun, Zhu, Ong, Yew-Soon, Gupta, Abhishek, Wei, Pengfei
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/153700
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-153700
record_format dspace
spelling sg-ntu-dr.10356-1537002022-01-01T20:12:17Z Minimalistic attacks : how little it takes to fool deep reinforcement learning policies Qu, Xinghua Sun, Zhu Ong, Yew-Soon Gupta, Abhishek Wei, Pengfei School of Computer Science and Engineering School of Electrical and Electronic Engineering Singapore Institute of Manufacturing Technology Engineering::Computer science and engineering Reinforcement Learning Adversarial Attacks Recent studies have revealed that neural network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this paper we take a more restrictive view towards adversary generation - with the goal of unveiling the limits of a model's vulnerability. In particular, we explore minimalistic attacks by defining \textit{\textbf{three key settings}}: (1) black-box policy access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; (2) fractional-state adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and (3) tactically-chanced attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: (i) all policies showcase significant performance degradation by merely modifying 0.01% of the input state, and (ii) the policy trained by DQN is totally deceived by perturbing only 1% frames. National Research Foundation (NRF) Accepted version This work is funded by the National Research Foundation, Singapore under its AI Singapore programme [Award No.: AISG-RP-2018-004] and the Data Science and Artificial Intelligence Research Center (DSAIR) at Nanyang Technological University. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of National Research Foundation, Singapore. 2021-12-08T08:33:44Z 2021-12-08T08:33:44Z 2020 Journal Article Qu, X., Sun, Z., Ong, Y., Gupta, A. & Wei, P. (2020). Minimalistic attacks : how little it takes to fool deep reinforcement learning policies. IEEE Transactions On Cognitive and Developmental Systems. https://dx.doi.org/10.1109/TCDS.2020.2974509 2379-8920 https://hdl.handle.net/10356/153700 10.1109/TCDS.2020.2974509 en AISG-RP-2018-004 IEEE Transactions on Cognitive and Developmental Systems © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TCDS.2020.2974509. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Reinforcement Learning
Adversarial Attacks
spellingShingle Engineering::Computer science and engineering
Reinforcement Learning
Adversarial Attacks
Qu, Xinghua
Sun, Zhu
Ong, Yew-Soon
Gupta, Abhishek
Wei, Pengfei
Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
description Recent studies have revealed that neural network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this paper we take a more restrictive view towards adversary generation - with the goal of unveiling the limits of a model's vulnerability. In particular, we explore minimalistic attacks by defining \textit{\textbf{three key settings}}: (1) black-box policy access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; (2) fractional-state adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and (3) tactically-chanced attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: (i) all policies showcase significant performance degradation by merely modifying 0.01% of the input state, and (ii) the policy trained by DQN is totally deceived by perturbing only 1% frames.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Qu, Xinghua
Sun, Zhu
Ong, Yew-Soon
Gupta, Abhishek
Wei, Pengfei
format Article
author Qu, Xinghua
Sun, Zhu
Ong, Yew-Soon
Gupta, Abhishek
Wei, Pengfei
author_sort Qu, Xinghua
title Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
title_short Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
title_full Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
title_fullStr Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
title_full_unstemmed Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
title_sort minimalistic attacks : how little it takes to fool deep reinforcement learning policies
publishDate 2021
url https://hdl.handle.net/10356/153700
_version_ 1722355364092968960