Towards understanding convergence and generalization of AdamW
AdamW modifies Adam by adding a decoupled weight decay to decay network weights per training iteration. For adaptive algorithms, this decoupled weight decay does not affect specific optimization steps, and differs from the widely used ℓ2-regularizer which changes optimization steps via changing the...
محفوظ في:
المؤلفون الرئيسيون: | ZHOU, Pan, XIE, Xingyu, LIN, Zhouchen, YAN, Shuicheng |
---|---|
التنسيق: | text |
اللغة: | English |
منشور في: |
Institutional Knowledge at Singapore Management University
2024
|
الموضوعات: | |
الوصول للمادة أونلاين: | https://ink.library.smu.edu.sg/sis_research/8986 https://ink.library.smu.edu.sg/context/sis_research/article/9989/viewcontent/2023_TPAMI_AdamW_Analysis.pdf |
الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
مواد مشابهة
-
LOWER CENTRAL SERIES IN UNSTABLE HOMOTOPY THEORY
بواسطة: FEDOR PAVUTNITSKIY
منشور في: (2018) -
A 3-D simulator using ADAMS for design of an autonomous gyroscopically stabilized single wheel robot
بواسطة: Zhu, Z., وآخرون
منشور في: (2014) -
PREPARATION OF MICROCAPSULES CONTAINING ANTHOCYANINS BY W/O/W EMULSION TECHNOLOGY AND SPRAY DRYING
بواسطة: HUANG YUAN
منشور في: (2016) -
Protein-Ligand Blind Docking Using QuickVina-W With Inter-Process Spatio-Temporal Integration
بواسطة: Hassan, Nafisa Mohamed, وآخرون
منشور في: (2018) -
Adam Smith, Settler Colonialism, and Limits of Liberal Anti-Imperialism
بواسطة: INCE, Onur Ulas
منشور في: (2021)