Regret-based defense in adversarial reinforcement learning

Deep Reinforcement Learning (DRL) policies are vulnerable to adversarial noise in observations, which can have disastrous consequences in safety-critical environments. For instance, a self-driving car receiving adversarially perturbed sensory observations about traffic signs (e.g., a stop sign physi...

Full description

Saved in:

Bibliographic Details
Main Authors:	BELAIRE, Roman, VARAKANTHAM, Pradeep, NGUYEN, Thanh Hong, LO, David
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Robust Reinforcement Learning Adversarial Robustness Regret Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9243 https://ink.library.smu.edu.sg/context/sis_research/article/10243/viewcontent/p2633.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10243
record_format	dspace
spelling	sg-smu-ink.sis_research-102432024-09-02T06:45:29Z Regret-based defense in adversarial reinforcement learning BELAIRE, Roman VARAKANTHAM, Pradeep NGUYEN, Thanh Hong LO, David Deep Reinforcement Learning (DRL) policies are vulnerable to adversarial noise in observations, which can have disastrous consequences in safety-critical environments. For instance, a self-driving car receiving adversarially perturbed sensory observations about traffic signs (e.g., a stop sign physically altered to be perceived as a speed limit sign) can be fatal. Leading existing approaches for making RL algorithms robust to an observation-perturbing adversary have focused on (a) regularization approaches that make expected value objectives robust by adding adversarial loss terms; or (b) employing "maximin'' (i.e., maximizing the minimum value) notions of robustness. While regularization approaches are adept at reducing the probability of successful attacks, their performance drops significantly when an attack is successful. On the other hand, maximin objectives, while robust, can be extremely conservative. To this end, we focus on optimizing a well-studied robustness objective, namely regret. To ensure the solutions provided are not too conservative, we optimize an approximation of regret using three different methods. We demonstrate that our methods outperform existing best approaches for adversarial RL problems across a variety of standard benchmarks from literature. 2024-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9243 info:doi/10.5555/3635637.3663250 https://ink.library.smu.edu.sg/context/sis_research/article/10243/viewcontent/p2633.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Robust Reinforcement Learning Adversarial Robustness Regret Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Robust Reinforcement Learning Adversarial Robustness Regret Software Engineering
spellingShingle	Robust Reinforcement Learning Adversarial Robustness Regret Software Engineering BELAIRE, Roman VARAKANTHAM, Pradeep NGUYEN, Thanh Hong LO, David Regret-based defense in adversarial reinforcement learning
description	Deep Reinforcement Learning (DRL) policies are vulnerable to adversarial noise in observations, which can have disastrous consequences in safety-critical environments. For instance, a self-driving car receiving adversarially perturbed sensory observations about traffic signs (e.g., a stop sign physically altered to be perceived as a speed limit sign) can be fatal. Leading existing approaches for making RL algorithms robust to an observation-perturbing adversary have focused on (a) regularization approaches that make expected value objectives robust by adding adversarial loss terms; or (b) employing "maximin'' (i.e., maximizing the minimum value) notions of robustness. While regularization approaches are adept at reducing the probability of successful attacks, their performance drops significantly when an attack is successful. On the other hand, maximin objectives, while robust, can be extremely conservative. To this end, we focus on optimizing a well-studied robustness objective, namely regret. To ensure the solutions provided are not too conservative, we optimize an approximation of regret using three different methods. We demonstrate that our methods outperform existing best approaches for adversarial RL problems across a variety of standard benchmarks from literature.
format	text
author	BELAIRE, Roman VARAKANTHAM, Pradeep NGUYEN, Thanh Hong LO, David
author_facet	BELAIRE, Roman VARAKANTHAM, Pradeep NGUYEN, Thanh Hong LO, David
author_sort	BELAIRE, Roman
title	Regret-based defense in adversarial reinforcement learning
title_short	Regret-based defense in adversarial reinforcement learning
title_full	Regret-based defense in adversarial reinforcement learning
title_fullStr	Regret-based defense in adversarial reinforcement learning
title_full_unstemmed	Regret-based defense in adversarial reinforcement learning
title_sort	regret-based defense in adversarial reinforcement learning
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9243 https://ink.library.smu.edu.sg/context/sis_research/article/10243/viewcontent/p2633.pdf
_version_	1814047842988195840

Regret-based defense in adversarial reinforcement learning

Similar Items