Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
Recent studies have revealed that neural network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this paper we take a more restrictive view towards adversary...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/153700 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-153700 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1537002022-01-01T20:12:17Z Minimalistic attacks : how little it takes to fool deep reinforcement learning policies Qu, Xinghua Sun, Zhu Ong, Yew-Soon Gupta, Abhishek Wei, Pengfei School of Computer Science and Engineering School of Electrical and Electronic Engineering Singapore Institute of Manufacturing Technology Engineering::Computer science and engineering Reinforcement Learning Adversarial Attacks Recent studies have revealed that neural network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this paper we take a more restrictive view towards adversary generation - with the goal of unveiling the limits of a model's vulnerability. In particular, we explore minimalistic attacks by defining \textit{\textbf{three key settings}}: (1) black-box policy access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; (2) fractional-state adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and (3) tactically-chanced attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: (i) all policies showcase significant performance degradation by merely modifying 0.01% of the input state, and (ii) the policy trained by DQN is totally deceived by perturbing only 1% frames. National Research Foundation (NRF) Accepted version This work is funded by the National Research Foundation, Singapore under its AI Singapore programme [Award No.: AISG-RP-2018-004] and the Data Science and Artificial Intelligence Research Center (DSAIR) at Nanyang Technological University. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of National Research Foundation, Singapore. 2021-12-08T08:33:44Z 2021-12-08T08:33:44Z 2020 Journal Article Qu, X., Sun, Z., Ong, Y., Gupta, A. & Wei, P. (2020). Minimalistic attacks : how little it takes to fool deep reinforcement learning policies. IEEE Transactions On Cognitive and Developmental Systems. https://dx.doi.org/10.1109/TCDS.2020.2974509 2379-8920 https://hdl.handle.net/10356/153700 10.1109/TCDS.2020.2974509 en AISG-RP-2018-004 IEEE Transactions on Cognitive and Developmental Systems © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TCDS.2020.2974509. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Reinforcement Learning Adversarial Attacks |
spellingShingle |
Engineering::Computer science and engineering Reinforcement Learning Adversarial Attacks Qu, Xinghua Sun, Zhu Ong, Yew-Soon Gupta, Abhishek Wei, Pengfei Minimalistic attacks : how little it takes to fool deep reinforcement learning policies |
description |
Recent studies have revealed that neural network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this paper we take a more restrictive view towards adversary generation - with the goal of unveiling the limits of a model's vulnerability. In particular, we explore minimalistic attacks by defining \textit{\textbf{three key settings}}: (1) black-box policy access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; (2) fractional-state adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and (3) tactically-chanced attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: (i) all policies showcase significant performance degradation by merely modifying 0.01% of the input state, and (ii) the policy trained by DQN is totally deceived by perturbing only 1% frames. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Qu, Xinghua Sun, Zhu Ong, Yew-Soon Gupta, Abhishek Wei, Pengfei |
format |
Article |
author |
Qu, Xinghua Sun, Zhu Ong, Yew-Soon Gupta, Abhishek Wei, Pengfei |
author_sort |
Qu, Xinghua |
title |
Minimalistic attacks : how little it takes to fool deep reinforcement learning policies |
title_short |
Minimalistic attacks : how little it takes to fool deep reinforcement learning policies |
title_full |
Minimalistic attacks : how little it takes to fool deep reinforcement learning policies |
title_fullStr |
Minimalistic attacks : how little it takes to fool deep reinforcement learning policies |
title_full_unstemmed |
Minimalistic attacks : how little it takes to fool deep reinforcement learning policies |
title_sort |
minimalistic attacks : how little it takes to fool deep reinforcement learning policies |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/153700 |
_version_ |
1722355364092968960 |