Sample-efficient iterative lower bound optimization of deep reactive policies for planning in continuous MDPs
Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-toend model-based gradient descent framework. This approach has proven e...
Saved in:
Main Authors: | LOW, Siow Meng, KUMAR, Akshat, SANNER, Scott |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7724 https://ink.library.smu.edu.sg/context/sis_research/article/8727/viewcontent/21220_Article_Text_25233_1_2_20220628.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Similar Items
-
Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)
by: AHMED, Asrar, et al.
Published: (2017) -
Solving long-run average reward robust MDPs via stochastic games
by: CHATTERJEE, Krishnendu, et al.
Published: (2024) -
Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation
by: KUMAR, Akshat, et al.
Published: (2009) -
History-Based Controller Design and Optimization for Partially Observable MDPs
by: KUMAR, Akshat, et al.
Published: (2015) -
Unleashing Dec-MDPs in Security Games: Enabling Effective Defender Teamwork
by: Shieh, Eric, et al.
Published: (2014)