Towards understanding why Lookahead generalizes better than SGD and beyond

To train networks, lookahead algorithm [1] updates its fast weights k times via an inner-loop optimizer before updating its slow weights once by using the latest fast weights. Any optimizer, e.g. SGD, can serve as the inner-loop optimizer, and the derived lookahead generally enjoys remarkable test p...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHOU, Pan, YAN, Hanshu, YUAN, Xiaotong, FENG, Jiashi, YAN, Shuicheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/8987 https://ink.library.smu.edu.sg/context/sis_research/article/9990/viewcontent/2021_NeurIPS_lookahead.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Internet

https://ink.library.smu.edu.sg/sis_research/8987
https://ink.library.smu.edu.sg/context/sis_research/article/9990/viewcontent/2021_NeurIPS_lookahead.pdf

Towards understanding why Lookahead generalizes better than SGD and beyond

Internet

Similar Items