Sample-efficient iterative lower bound optimization of deep reactive policies for planning in continuous MDPs

Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-toend model-based gradient descent framework. This approach has proven e...

Full description

Saved in:
Bibliographic Details
Main Authors: LOW, Siow Meng, KUMAR, Akshat, SANNER, Scott
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7724
https://ink.library.smu.edu.sg/context/sis_research/article/8727/viewcontent/21220_Article_Text_25233_1_2_20220628.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English

Similar Items