FlowPG: Action-constrained policy gradient with normalizing flows

Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a pr...

Full description

Saved in:

Bibliographic Details
Main Authors:	BRAHMANAGE JANAKA CHATHURANGA THILAKARATHNA, LING, Jiajing, KUMAR, Akshat
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Artificial Intelligence and Robotics Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/8551 https://ink.library.smu.edu.sg/context/sis_research/article/9554/viewcontent/11351_flowpg_action_constrained_poli.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9554
record_format	dspace
spelling	sg-smu-ink.sis_research-95542024-01-22T14:48:10Z FlowPG: Action-constrained policy gradient with normalizing flows BRAHMANAGE JANAKA CHATHURANGA THILAKARATHNA, LING, Jiajing KUMAR, Akshat Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging. We develop multiple methods, based on Hamiltonian Monte-Carlo and probabilistic sentential decision diagrams for such action sampling for convex and non-convex constraints. Third, we integrate the learned normalizing flow with the DDPG algorithm. By design, a well-trained normalizing flow will transform policy output into a valid action without requiring an optimization solver. Empirically, our approach results in significantly fewer constraint violations (upto an order-of-magnitude for several instances) and is multiple times faster on a variety of continuous control tasks. 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8551 https://ink.library.smu.edu.sg/context/sis_research/article/9554/viewcontent/11351_flowpg_action_constrained_poli.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial Intelligence and Robotics Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Artificial Intelligence and Robotics Databases and Information Systems
spellingShingle	Artificial Intelligence and Robotics Databases and Information Systems BRAHMANAGE JANAKA CHATHURANGA THILAKARATHNA, LING, Jiajing KUMAR, Akshat FlowPG: Action-constrained policy gradient with normalizing flows
description	Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging. We develop multiple methods, based on Hamiltonian Monte-Carlo and probabilistic sentential decision diagrams for such action sampling for convex and non-convex constraints. Third, we integrate the learned normalizing flow with the DDPG algorithm. By design, a well-trained normalizing flow will transform policy output into a valid action without requiring an optimization solver. Empirically, our approach results in significantly fewer constraint violations (upto an order-of-magnitude for several instances) and is multiple times faster on a variety of continuous control tasks.
format	text
author	BRAHMANAGE JANAKA CHATHURANGA THILAKARATHNA, LING, Jiajing KUMAR, Akshat
author_facet	BRAHMANAGE JANAKA CHATHURANGA THILAKARATHNA, LING, Jiajing KUMAR, Akshat
author_sort	BRAHMANAGE JANAKA CHATHURANGA THILAKARATHNA,
title	FlowPG: Action-constrained policy gradient with normalizing flows
title_short	FlowPG: Action-constrained policy gradient with normalizing flows
title_full	FlowPG: Action-constrained policy gradient with normalizing flows
title_fullStr	FlowPG: Action-constrained policy gradient with normalizing flows
title_full_unstemmed	FlowPG: Action-constrained policy gradient with normalizing flows
title_sort	flowpg: action-constrained policy gradient with normalizing flows
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/8551 https://ink.library.smu.edu.sg/context/sis_research/article/9554/viewcontent/11351_flowpg_action_constrained_poli.pdf
_version_	1789483263722520576

FlowPG: Action-constrained policy gradient with normalizing flows

Similar Items