Towards understanding convergence and generalization of AdamW
AdamW modifies Adam by adding a decoupled weight decay to decay network weights per training iteration. For adaptive algorithms, this decoupled weight decay does not affect specific optimization steps, and differs from the widely used ℓ2-regularizer which changes optimization steps via changing the...
Saved in:
Main Authors: | ZHOU, Pan, XIE, Xingyu, LIN, Zhouchen, YAN, Shuicheng |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8986 https://ink.library.smu.edu.sg/context/sis_research/article/9989/viewcontent/2023_TPAMI_AdamW_Analysis.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Similar Items
-
LOWER CENTRAL SERIES IN UNSTABLE HOMOTOPY THEORY
by: FEDOR PAVUTNITSKIY
Published: (2018) -
A 3-D simulator using ADAMS for design of an autonomous gyroscopically stabilized single wheel robot
by: Zhu, Z., et al.
Published: (2014) -
PREPARATION OF MICROCAPSULES CONTAINING ANTHOCYANINS BY W/O/W EMULSION TECHNOLOGY AND SPRAY DRYING
by: HUANG YUAN
Published: (2016) -
Protein-Ligand Blind Docking Using QuickVina-W With Inter-Process Spatio-Temporal Integration
by: Hassan, Nafisa Mohamed, et al.
Published: (2018) -
Adam Smith, Settler Colonialism, and Limits of Liberal Anti-Imperialism
by: INCE, Onur Ulas
Published: (2021)