Accumulated decoupled learning with gradient staleness mitigation for convolutional neural networks
Gradient staleness is a major side effect in decoupled learning when training convolutional neural networks asynchronously. Existing methods that ignore this effect might result in reduced generalization and even divergence. In this paper, we propose an accumulated decoupled learning (ADL), wh...
Saved in:
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/174480 https://icml.cc/virtual/2021/index.html |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Gradient staleness is a major side effect in decoupled
learning when training convolutional neural
networks asynchronously. Existing methods that
ignore this effect might result in reduced generalization
and even divergence. In this paper,
we propose an accumulated decoupled learning
(ADL), which includes a module-wise gradient
accumulation in order to mitigate the gradient
staleness. Unlike prior arts ignoring the gradient
staleness, we quantify the staleness in such a way
that its mitigation can be quantitatively visualized.
As a new learning scheme, the proposed ADL is
theoretically shown to converge to critical points
in spite of its asynchronism. Extensive experiments
on CIFAR-10 and ImageNet datasets are
conducted, demonstrating that ADL gives promising
generalization results while the state-of-theart
methods experience reduced generalization
and divergence. In addition, our ADL is shown to
have the fastest training speed among the compared
methods. The code will be ready soon
in https://github.com/ZHUANGHP/Accumulated-
Decoupled-Learning.git. |
---|