Accumulated decoupled learning with gradient staleness mitigation for convolutional neural networks
Gradient staleness is a major side effect in decoupled learning when training convolutional neural networks asynchronously. Existing methods that ignore this effect might result in reduced generalization and even divergence. In this paper, we propose an accumulated decoupled learning (ADL), wh...
محفوظ في:
المؤلفون الرئيسيون: | , , , , , |
---|---|
مؤلفون آخرون: | |
التنسيق: | Conference or Workshop Item |
اللغة: | English |
منشور في: |
2024
|
الموضوعات: | |
الوصول للمادة أونلاين: | https://hdl.handle.net/10356/174480 https://icml.cc/virtual/2021/index.html |
الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
المؤسسة: | Nanyang Technological University |
اللغة: | English |
الملخص: | Gradient staleness is a major side effect in decoupled
learning when training convolutional neural
networks asynchronously. Existing methods that
ignore this effect might result in reduced generalization
and even divergence. In this paper,
we propose an accumulated decoupled learning
(ADL), which includes a module-wise gradient
accumulation in order to mitigate the gradient
staleness. Unlike prior arts ignoring the gradient
staleness, we quantify the staleness in such a way
that its mitigation can be quantitatively visualized.
As a new learning scheme, the proposed ADL is
theoretically shown to converge to critical points
in spite of its asynchronism. Extensive experiments
on CIFAR-10 and ImageNet datasets are
conducted, demonstrating that ADL gives promising
generalization results while the state-of-theart
methods experience reduced generalization
and divergence. In addition, our ADL is shown to
have the fastest training speed among the compared
methods. The code will be ready soon
in https://github.com/ZHUANGHP/Accumulated-
Decoupled-Learning.git. |
---|