Towards theoretically understanding why SGD generalizes better than ADAM in deep learning
It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the...
Saved in:
Main Authors: | ZHOU, Pan, FENG, Jiashi, MA, Chao, XIONG, Caiming, HOI, Steven C. H., E, Weinan |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2020
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8999 https://ink.library.smu.edu.sg/context/sis_research/article/10002/viewcontent/2020_NeurIPS_Adam_Analysis.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Similar Items
-
Towards understanding why Lookahead generalizes better than SGD and beyond
by: ZHOU, Pan, et al.
Published: (2021) -
Empirical risk landscape analysis for understanding deep neural networks
by: ZHOU, Pan, et al.
Published: (2018) -
Theory-inspired path-regularized differential network architecture search
by: ZHOU, Pan, et al.
Published: (2020) -
New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity
by: ZHOU, Pan, et al.
Published: (2018) -
Multi-target deep neural networks: Theoretical analysis and implementation
by: ZENG, Zeng, et al.
Published: (2018)