Towards theoretically understanding why SGD generalizes better than ADAM in deep learning
It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the...
Saved in:
Main Authors: | , , , , , |
---|---|
格式: | text |
語言: | English |
出版: |
Institutional Knowledge at Singapore Management University
2020
|
主題: | |
在線閱讀: | https://ink.library.smu.edu.sg/sis_research/8999 https://ink.library.smu.edu.sg/context/sis_research/article/10002/viewcontent/2020_NeurIPS_Adam_Analysis.pdf |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
機構: | Singapore Management University |
語言: | English |