Reducing estimation bias via triplet-average deep deterministic policy gradient

The overestimation caused by function approximation is a well-known property in Q-learning algorithms, especially in single-critic models, which leads to poor performance in practical tasks. However, the opposite property, underestimation, which often occurs in Q-learning methods with double critics...

全面介紹

Saved in:

書目詳細資料
Main Authors:	WU, Dongming, DONG, Xingping, SHEN, Jianbing, HOI, Steven C. H.
格式:	text
語言:	English
出版:	Institutional Knowledge at Singapore Management University 2020
主題:	Averaging technology deep reinforcement learning (DRL) estimation bias triplet networks Numerical Analysis and Scientific Computing Software Engineering Theory and Algorithms
在線閱讀:	https://ink.library.smu.edu.sg/sis_research/5920 https://ink.library.smu.edu.sg/context/sis_research/article/6923/viewcontent/tnnls19ReducingBias_av.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Singapore Management University
語言:	English

因特網

https://ink.library.smu.edu.sg/sis_research/5920
https://ink.library.smu.edu.sg/context/sis_research/article/6923/viewcontent/tnnls19ReducingBias_av.pdf

Reducing estimation bias via triplet-average deep deterministic policy gradient

因特網

相似書籍