Imitate the good and avoid the bad: An incremental approach to safe reinforcement learning

A popular framework for enforcing safe actions in Rein- forcement Learning (RL) is Constrained RL, where trajectory based constraints on expected cost (or other cost measures) are employed to enforce safety and more importantly these constraints are enforced while maximizing expected reward. Most re...

Full description

Saved in:

Bibliographic Details
Main Authors:	HOANG, Minh Huy, MAI, Tien, VARAKANTHAM, Pradeep
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Safe reinforcement learning Imitation learning Artificial Intelligence and Robotics
Online Access:	https://ink.library.smu.edu.sg/sis_research/9622 https://ink.library.smu.edu.sg/context/sis_research/article/10622/viewcontent/29136_Article_Text_33190_1_2_20240324__1_.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10622
record_format	dspace
spelling	sg-smu-ink.sis_research-106222024-11-23T15:36:53Z Imitate the good and avoid the bad: An incremental approach to safe reinforcement learning HOANG, Minh Huy MAI, Tien VARAKANTHAM, Pradeep A popular framework for enforcing safe actions in Rein- forcement Learning (RL) is Constrained RL, where trajectory based constraints on expected cost (or other cost measures) are employed to enforce safety and more importantly these constraints are enforced while maximizing expected reward. Most recent approaches for solving Constrained RL convert the trajectory based cost constraint into a surrogate problem that can be solved using minor modifications to RL methods. A key drawback with such approaches is an over or under- estimation of the cost constraint at each state. Therefore, we provide an approach that does not modify the trajectory based cost constraint and instead imitates “good” trajectories and avoids “bad” trajectories generated from incrementally im- proving policies. We employ an oracle that utilizes a reward threshold (which is varied with learning) and the overall cost constraint to label trajectories as “good” or “bad”. A key ad- vantage of our approach is that we are able to work from any starting policy or set of trajectories and improve on it. In an exhaustive set of experiments, we demonstrate that our ap- proach is able to outperform top benchmark approaches for solving Constrained RL problems, with respect to expected cost, CVaR cost, or even unknown cost constraints. 2024-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9622 https://ink.library.smu.edu.sg/context/sis_research/article/10622/viewcontent/29136_Article_Text_33190_1_2_20240324__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Safe reinforcement learning Imitation learning Artificial Intelligence and Robotics
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Safe reinforcement learning Imitation learning Artificial Intelligence and Robotics
spellingShingle	Safe reinforcement learning Imitation learning Artificial Intelligence and Robotics HOANG, Minh Huy MAI, Tien VARAKANTHAM, Pradeep Imitate the good and avoid the bad: An incremental approach to safe reinforcement learning
description	A popular framework for enforcing safe actions in Rein- forcement Learning (RL) is Constrained RL, where trajectory based constraints on expected cost (or other cost measures) are employed to enforce safety and more importantly these constraints are enforced while maximizing expected reward. Most recent approaches for solving Constrained RL convert the trajectory based cost constraint into a surrogate problem that can be solved using minor modifications to RL methods. A key drawback with such approaches is an over or under- estimation of the cost constraint at each state. Therefore, we provide an approach that does not modify the trajectory based cost constraint and instead imitates “good” trajectories and avoids “bad” trajectories generated from incrementally im- proving policies. We employ an oracle that utilizes a reward threshold (which is varied with learning) and the overall cost constraint to label trajectories as “good” or “bad”. A key ad- vantage of our approach is that we are able to work from any starting policy or set of trajectories and improve on it. In an exhaustive set of experiments, we demonstrate that our ap- proach is able to outperform top benchmark approaches for solving Constrained RL problems, with respect to expected cost, CVaR cost, or even unknown cost constraints.
format	text
author	HOANG, Minh Huy MAI, Tien VARAKANTHAM, Pradeep
author_facet	HOANG, Minh Huy MAI, Tien VARAKANTHAM, Pradeep
author_sort	HOANG, Minh Huy
title	Imitate the good and avoid the bad: An incremental approach to safe reinforcement learning
title_short	Imitate the good and avoid the bad: An incremental approach to safe reinforcement learning
title_full	Imitate the good and avoid the bad: An incremental approach to safe reinforcement learning
title_fullStr	Imitate the good and avoid the bad: An incremental approach to safe reinforcement learning
title_full_unstemmed	Imitate the good and avoid the bad: An incremental approach to safe reinforcement learning
title_sort	imitate the good and avoid the bad: an incremental approach to safe reinforcement learning
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9622 https://ink.library.smu.edu.sg/context/sis_research/article/10622/viewcontent/29136_Article_Text_33190_1_2_20240324__1_.pdf
_version_	1816859169567801344

Imitate the good and avoid the bad: An incremental approach to safe reinforcement learning

Similar Items