Minimalistic attacks : how little it takes to fool deep reinforcement learning policies

Recent studies have revealed that neural network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this paper we take a more restrictive view towards adversary...

Full description

Saved in:

Bibliographic Details
Main Authors:	Qu, Xinghua, Sun, Zhu, Ong, Yew-Soon, Gupta, Abhishek, Wei, Pengfei
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2021
Subjects:	Engineering::Computer science and engineering Reinforcement Learning Adversarial Attacks
Online Access:	https://hdl.handle.net/10356/153700
Tags:	Add Tag No Tags, Be the first to tag this record!

id	sg-ntu-dr.10356-153700
record_format	dspace
spelling	sg-ntu-dr.10356-1537002022-01-01T20:12:17Z Minimalistic attacks : how little it takes to fool deep reinforcement learning policies Qu, Xinghua Sun, Zhu Ong, Yew-Soon Gupta, Abhishek Wei, Pengfei School of Computer Science and Engineering School of Electrical and Electronic Engineering Singapore Institute of Manufacturing Technology Engineering::Computer science and engineering Reinforcement Learning Adversarial Attacks Recent studies have revealed that neural network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this paper we take a more restrictive view towards adversary generation - with the goal of unveiling the limits of a model's vulnerability. In particular, we explore minimalistic attacks by defining \textit{\textbf{three key settings}}: (1) black-box policy access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; (2) fractional-state adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and (3) tactically-chanced attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: (i) all policies showcase significant performance degradation by merely modifying 0.01% of the input state, and (ii) the policy trained by DQN is totally deceived by perturbing only 1% frames. National Research Foundation (NRF) Accepted version This work is funded by the National Research Foundation, Singapore under its AI Singapore programme [Award No.: AISG-RP-2018-004] and the Data Science and Artificial Intelligence Research Center (DSAIR) at Nanyang Technological University. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of National Research Foundation, Singapore. 2021-12-08T08:33:44Z 2021-12-08T08:33:44Z 2020 Journal Article Qu, X., Sun, Z., Ong, Y., Gupta, A. & Wei, P. (2020). Minimalistic attacks : how little it takes to fool deep reinforcement learning policies. IEEE Transactions On Cognitive and Developmental Systems. https://dx.doi.org/10.1109/TCDS.2020.2974509 2379-8920 https://hdl.handle.net/10356/153700 10.1109/TCDS.2020.2974509 en AISG-RP-2018-004 IEEE Transactions on Cognitive and Developmental Systems © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TCDS.2020.2974509. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Reinforcement Learning Adversarial Attacks
spellingShingle	Engineering::Computer science and engineering Reinforcement Learning Adversarial Attacks Qu, Xinghua Sun, Zhu Ong, Yew-Soon Gupta, Abhishek Wei, Pengfei Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
description	Recent studies have revealed that neural network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this paper we take a more restrictive view towards adversary generation - with the goal of unveiling the limits of a model's vulnerability. In particular, we explore minimalistic attacks by defining \textit{\textbf{three key settings}}: (1) black-box policy access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; (2) fractional-state adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and (3) tactically-chanced attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: (i) all policies showcase significant performance degradation by merely modifying 0.01% of the input state, and (ii) the policy trained by DQN is totally deceived by perturbing only 1% frames.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Qu, Xinghua Sun, Zhu Ong, Yew-Soon Gupta, Abhishek Wei, Pengfei
format	Article
author	Qu, Xinghua Sun, Zhu Ong, Yew-Soon Gupta, Abhishek Wei, Pengfei
author_sort	Qu, Xinghua
title	Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
title_short	Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
title_full	Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
title_fullStr	Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
title_full_unstemmed	Minimalistic attacks : how little it takes to fool deep reinforcement learning policies
title_sort	minimalistic attacks : how little it takes to fool deep reinforcement learning policies
publishDate	2021
url	https://hdl.handle.net/10356/153700
_version_	1722355364092968960

Minimalistic attacks : how little it takes to fool deep reinforcement learning policies

Similar Items