Sample-efficient iterative lower bound optimization of deep reactive policies for planning in continuous MDPs
Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-toend model-based gradient descent framework. This approach has proven e...
Saved in:
Main Authors: | LOW, Siow Meng, KUMAR, Akshat, SANNER, Scott |
---|---|
格式: | text |
語言: | English |
出版: |
Institutional Knowledge at Singapore Management University
2022
|
主題: | |
在線閱讀: | https://ink.library.smu.edu.sg/sis_research/7724 https://ink.library.smu.edu.sg/context/sis_research/article/8727/viewcontent/21220_Article_Text_25233_1_2_20220628.pdf |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
機構: | Singapore Management University |
語言: | English |
相似書籍
-
Sampling based approaches for minimizing regret in uncertain Markov Decision Problems (MDPs)
由: AHMED, Asrar, et al.
出版: (2017) -
Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation
由: KUMAR, Akshat, et al.
出版: (2009) -
Solving long-run average reward robust MDPs via stochastic games
由: CHATTERJEE, Krishnendu, et al.
出版: (2024) -
History-Based Controller Design and Optimization for Partially Observable MDPs
由: KUMAR, Akshat, et al.
出版: (2015) -
Unleashing Dec-MDPs in Security Games: Enabling Effective Defender Teamwork
由: Shieh, Eric, et al.
出版: (2014)