Towards theoretically understanding why SGD generalizes better than ADAM in deep learning

It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHOU, Pan, FENG, Jiashi, MA, Chao, XIONG, Caiming, HOI, Steven C. H., E, Weinan
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2020
Subjects:	Databases and Information Systems OS and Networks
Online Access:	https://ink.library.smu.edu.sg/sis_research/8999 https://ink.library.smu.edu.sg/context/sis_research/article/10002/viewcontent/2020_NeurIPS_Adam_Analysis.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Be the first to leave a comment!

Towards theoretically understanding why SGD generalizes better than ADAM in deep learning

Similar Items