Reducing estimation bias via triplet-average deep deterministic policy gradient

The overestimation caused by function approximation is a well-known property in Q-learning algorithms, especially in single-critic models, which leads to poor performance in practical tasks. However, the opposite property, underestimation, which often occurs in Q-learning methods with double critics...

全面介紹

Saved in:
書目詳細資料
Main Authors: WU, Dongming, DONG, Xingping, SHEN, Jianbing, HOI, Steven C. H.
格式: text
語言:English
出版: Institutional Knowledge at Singapore Management University 2020
主題:
在線閱讀:https://ink.library.smu.edu.sg/sis_research/5920
https://ink.library.smu.edu.sg/context/sis_research/article/6923/viewcontent/tnnls19ReducingBias_av.pdf
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Singapore Management University
語言: English