Towards theoretically understanding why SGD generalizes better than ADAM in deep learning

It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the...

全面介紹

Saved in:

書目詳細資料
Main Authors:	ZHOU, Pan, FENG, Jiashi, MA, Chao, XIONG, Caiming, HOI, Steven C. H., E, Weinan
格式:	text
語言:	English
出版:	Institutional Knowledge at Singapore Management University 2020
主題:	Databases and Information Systems OS and Networks
在線閱讀:	https://ink.library.smu.edu.sg/sis_research/8999 https://ink.library.smu.edu.sg/context/sis_research/article/10002/viewcontent/2020_NeurIPS_Adam_Analysis.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

因特網

https://ink.library.smu.edu.sg/sis_research/8999
https://ink.library.smu.edu.sg/context/sis_research/article/10002/viewcontent/2020_NeurIPS_Adam_Analysis.pdf

Towards theoretically understanding why SGD generalizes better than ADAM in deep learning

因特網

相似書籍