Adan: Adaptive Nesterov Momentum Algorithm for faster optimizing deep models

Adan: Adaptive Nesterov Momentum Algorithm for faster optimizing deep models

In deep learning, different kinds of deep networks typically need different optimizers, which have to be chosen after multiple trials, making the training process inefficient. To relieve this issue and consistently improve the model training speed across deep networks, we propose the ADAptive Nester...

Full description

Saved in:

Bibliographic Details
Main Authors:	XIE, Xingyu, ZHOU, Pan, LI, Huan, LIN, Zhouchen, YAN, Shuicheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Adaptive optimizer Complexity theory Computer architecture Convergence Deep learning DNN optimizer Fast DNN training Stochastic processes Task analysis Training OS and Networks Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/9037 https://ink.library.smu.edu.sg/context/sis_research/article/10040/viewcontent/ADAN_sv.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Win: Weight-decay-integrated Nesterov acceleration for faster network training
by: ZHOU, Pan, et al.
Published: (2024)

Win: Weight-decay-integrated nesterov acceleration for adaptive gradient algorithms
by: ZHOU, Pan, et al.
Published: (2023)

DeepRepair: Style-guided repairing for deep neural networks in the real-world operational environment
by: YU, Bing, et al.
Published: (2021)

Test optimization in DNN testing: A survey
by: HU, Qiang, et al.
Published: (2024)

Faster first-order methods for stochastic non-convex optimization on Riemannian manifolds
by: ZHOU, Pan, et al.
Published: (2019)

HolyLight : a nanophotonic accelerator for deep learning in data centers
by: Liu, Weichen, et al.
Published: (2020)

NNFacet: splitting neural network for concurrent smart sensors
by: Chen, Jiale, et al.
Published: (2023)

An Investigation of Deep Learning Models for EEG-Based Emotion Recognition
by: Zhang, Y., et al.
Published: (2021)

Code-switching detection using multilingual DNNS
by: Yilmaz E., et al.
Published: (2018)

Modeling the chemotaxis behaviors of C. Elegans using neural network: from artificial to biological approach
by: DENG XIN
Published: (2013)

Topic tones of analyst reports and stock returns: A deep learning approach
by: IWASAKI, Hitoshi, et al.
Published: (2020)

Efficient gradient support pursuit with less hard thresholding for cardinality-constrained learning
by: SHANG, Fanhua, et al.
Published: (2021)

A buyer-traceable DNN model IP protection method against piracy and misappropriation
by: Wang, Si, et al.
Published: (2022)

A DNN fingerprint for non-repudiable model ownership identification and piracy detection
by: Zheng, Yue, et al.
Published: (2022)

A hybrid stochastic-deterministic minibatch proximal gradient method for efficient optimization and generalization
by: ZHOU, Pan, et al.
Published: (2021)

METAMODELING AND OPTIMIZATION WITH GAUSSIAN PROCESS MODELS FOR STOCHASTIC SIMULATIONS
by: WANG SONGHAO
Published: (2020)

Tractable approximations to robust conic optimization problems
by: Bertsimas, D., et al.
Published: (2013)

Mercury: an automated remote side-channel attack to Nvidia deep learning accelerator
by: Yan, Xiaobei, et al.
Published: (2023)

Efficient simulation budget allocation with regression
by: Brantley, M.W., et al.
Published: (2014)

STRUCTURED DATA ANALYSIS: MODELS, ALGORITHMS AND THEORIES
by: ZHOU PAN
Published: (2020)

Fingerprinting deep neural networks - a DeepFool approach
by: Wang, Si, et al.
Published: (2021)

Reinforced adaptation network for partial domain adaptation
by: WU, Keyu, et al.
Published: (2023)

Towards understanding convergence and generalization of AdamW
by: ZHOU, Pan, et al.
Published: (2024)

A 3 MONTH FOLLOW UP ASSESSMENT EVALUATING THE EFFECTIVENESS OF DUAL TASK TRAINING IN IMPROVING THE COGNITION OF AN ELDERLY AT RISK OF EARLY DEMENTIA: AN EXPLORATORY STUDY
by: CHIN YUIN YIH
Published: (2018)

Click-through-based cross-view learning for image search
by: PAN, Yingwei, et al.
Published: (2014)

Multilayer dielectric filter design using a multiobjective evolutionary algorithm
by: Venkatarayalu, N.V., et al.
Published: (2014)

Rotation invariant convolutions for 3D point clouds deep learning
by: ZHANG, Zhiyuan, et al.
Published: (2019)

EDGE OF CHAOS IN DEEP LEARNING MODELS AND ITS APPLICATION TO TRAINING ALGORITHMS
by: ZHANG LIN
Published: (2021)

Click-through-based subspace learning for image search
by: PAN, Yingwei, et al.
Published: (2014)

Expediting the accuracy-improving process of SVMs for class imbalance learning
by: CAO, Bin, et al.
Published: (2021)

4-bit shampoo for memory-efficient network training
by: WANG, Sike, et al.
Published: (2024)

Self-learning neurofuzzy control of a liquid helium cryostat
by: Tan, W.W., et al.
Published: (2014)

Self-learning neurofuzzy control of a liquid helium cryostat
by: Tan, W.W., et al.
Published: (2014)

A self-learning fuzzy controller for embedded applications
by: Tan, W.W., et al.
Published: (2014)

Faster rates for compressed federated learning with client-variance reduction
by: ZHAO, Haoyu, et al.
Published: (2024)

Mitigating popularity bias in recommendation with unbalanced interactions: A gradient perspective
by: REN, Weijieying, et al.
Published: (2022)

Effects of the Training BIGTM and Task specific concepts on turning over 180 degrees in patients with Parkinson’s disease
by: Fuengfa Khobkhun, et al.
Published: (2014)

Task decomposition with pattern distributor networks
by: BAO CHUNYU
Published: (2010)

APPLIED STOCHASTIC CONTROL IN OPTIMAL LIQUIDATION STRATEGIES
by: TRAN HOANG HAI
Published: (2021)

Models for minimax stochastic linear optimization problems with risk aversion
by: Bertsimas, D., et al.
Published: (2013)