Towards theoretically understanding why SGD generalizes better than ADAM in deep learning

It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the...

全面介紹

Saved in:
書目詳細資料
Main Authors: ZHOU, Pan, FENG, Jiashi, MA, Chao, XIONG, Caiming, HOI, Steven C. H., E, Weinan
格式: text
語言:English
出版: Institutional Knowledge at Singapore Management University 2020
主題:
在線閱讀:https://ink.library.smu.edu.sg/sis_research/8999
https://ink.library.smu.edu.sg/context/sis_research/article/10002/viewcontent/2020_NeurIPS_Adam_Analysis.pdf
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Singapore Management University
語言: English