Towards understanding why Lookahead generalizes better than SGD and beyond

Towards understanding why Lookahead generalizes better than SGD and beyond

To train networks, lookahead algorithm [1] updates its fast weights k times via an inner-loop optimizer before updating its slow weights once by using the latest fast weights. Any optimizer, e.g. SGD, can serve as the inner-loop optimizer, and the derived lookahead generally enjoys remarkable test p...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHOU, Pan, YAN, Hanshu, YUAN, Xiaotong, FENG, Jiashi, YAN, Shuicheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/8987 https://ink.library.smu.edu.sg/context/sis_research/article/9990/viewcontent/2021_NeurIPS_lookahead.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Towards theoretically understanding why SGD generalizes better than ADAM in deep learning
by: ZHOU, Pan, et al.
Published: (2020)

Understanding generalization and optimization performance of deep CNNs
by: ZHOU, Pan, et al.
Published: (2018)

Towards understanding convergence and generalization of AdamW
by: ZHOU, Pan, et al.
Published: (2024)

Empirical risk landscape analysis for understanding deep neural networks
by: ZHOU, Pan, et al.
Published: (2018)

Faster first-order methods for stochastic non-convex optimization on Riemannian manifolds
by: ZHOU, Pan, et al.
Published: (2019)

Dynamic lookahead mechanism for conserving power in multi-player mobile games
by: THIRUGNANAM, Karthik, et al.
Published: (2012)

Towards understanding why mask reconstruction pretraining helps in downstream tasks
by: PAN, Jiachun, et al.
Published: (2023)

ALTERASI DAN MINERALISASI CONTO BATUAN SUMUR SGD 01, SGD 03, DAN SGD 04 DAERAH SELODONG, LOMBOK BARAT NUSA TENGGARA BARAT
by: Didit Haryanto, Agus

The predictive audit: Why prevention is better than a cure
by: GOH, Clarence
Published: (2017)

The predictive audit: Why prevention is better than a cure
by: GOH, Clarence
Published: (2017)

Factorized carry lookahead adders
by: Balasubramanian, Padmanabhan, et al.
Published: (2020)

Asia in 2015: A lookahead
by: Singapore Management University
Published: (2015)

Better to understand than to be understood: Case study
by: Coruna, Maria Juanita J.
Published: (1977)

Why breast cancer signatures are no better than random signatures explained
by: Goh, Wilson Wen Bin, et al.
Published: (2020)

Hybrid stochastic-deterministic minibatch proximal gradient: Less-than-single-pass optimization with nearly optimal generalization
by: ZHOU, Pan, et al.
Published: (2020)

Why CPF-Style Systems Generally Work Better
by: HOON, Hian Teck
Published: (2014)

Automatic billing counterfeit detection for SGD money
by: Arun Ramchandani.
Published: (2010)

Efficient stochastic gradient hard thresholding
by: ZHOU, Pan, et al.
Published: (2018)

Task similarity aware meta learning: Theory-inspired improvement on MAML
by: ZHOU, Pan, et al.
Published: (2021)

Better than bread and butter: Singapore’s general election 2011
by: Knowledge@SMU
Published: (2011)

Beyond search: Event-driven summarization for web videos
by: HONG, Richard, et al.
Published: (2011)

Analysis of properties of single molecules in vivo or... why small fish is better than empty dish
by: Korzh, V., et al.
Published: (2014)

Bayesian optimization with switching cost: Regret analysis and lookahead variants
by: LIU, Peng, et al.
Published: (2023)

A higher radix architecture for quantum carry-lookahead adder
by: Wang, Siyi, et al.
Published: (2024)

Anchors weigh more than power: Why absolute powerlessness liberates negotiators to achieve better outcomes
by: SCHAERER, Michael, et al.
Published: (2015)

Inception transformer
by: SI, Chenyang, et al.
Published: (2022)

Dynamic lookahead mechanism for conserving power in multi-player mobile games
by: Thirugnanam, K., et al.
Published: (2013)

Win: Weight-decay-integrated nesterov acceleration for adaptive gradient algorithms
by: ZHOU, Pan, et al.
Published: (2023)

New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity
by: ZHOU, Pan, et al.
Published: (2018)

SGD-Rec: A Matrix Decomposition Based Model for Personalized Movie Recommendation
by: Siripen Pongpaichet, et al.
Published: (2020)

Speed and energy optimized quasi-delay-insensitive block carry lookahead adder
by: Mastorakis, N. E., et al.
Published: (2019)

Compassion: Why it is better to eat fish
by: Knowledge@SMU
Published: (2009)

Escaping saddle points in heterogeneous federated learning via distributed SGD with communication compression
by: CHEN, Sijin, et al.
Published: (2024)

Multilateralism in 2021 : better than 2020?
by: Ng, Joel
Published: (2021)

Answering why-not and why questions on reverse top-k queries
by: LIU, Qing, et al.
Published: (2016)

Better Guider Predicts Future Better: Difference Guided Generative Adversarial Networks
by: YING, GUOHAO, et al.
Published: (2019)

BETTER SAFE THAN SORRY: ROLE OF PSYCHOLOGICAL SAFETY AND WORK-BASED SOCIAL SUPPORT IN UNDERSTANDING THE IMPACTS OF TECHNOSTRESS
by: CHUA WAN LIN, CHERYL
Published: (2020)

Hands-on is better than look-on: Condom use
by: Pratak O-Prasertsawat, et al.
Published: (2018)

A better inaugural speech than usual
by: La Viña, Antonio Gabriel M.
Published: (2022)

NUS is better than NTU - myth or reality?
by: Ho, Hsiao Ming., et al.
Published: (2008)