Fully decoupled neural network learning using delayed gradients

Training neural networks with back-propagation (BP) requires a sequential passing of activations and gradients. This has been recognized as the lockings (i.e., the forward, backward, and update lockings) among modules (each module contains a stack of layers) inherited from the BP. In this paper,...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhuang, Huiping, Wang, Yi, Liu, Qinglai, Lin, Zhiping
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174476
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-174476
record_format dspace
spelling sg-ntu-dr.10356-1744762024-04-05T15:41:28Z Fully decoupled neural network learning using delayed gradients Zhuang, Huiping Wang, Yi Liu, Qinglai Lin, Zhiping School of Electrical and Electronic Engineering Temasek Laboratories @ NTU Computer and Information Science Decoupled learning Delayed gradients Training neural networks with back-propagation (BP) requires a sequential passing of activations and gradients. This has been recognized as the lockings (i.e., the forward, backward, and update lockings) among modules (each module contains a stack of layers) inherited from the BP. In this paper, we propose a fully decoupled training scheme using delayed gradients (FDG) to break all these lockings. The FDG splits a neural network into multiple modules and trains them independently and asynchronously using different workers (e.g., GPUs). We also introduce a gradient shrinking process to reduce the stale gradient effect caused by the delayed gradients. Our theoretical proofs show that the FDG can converge to critical points under certain conditions. Experiments are conducted by training deep convolutional neural networks to perform classification tasks on several benchmark datasets. These experiments show comparable or better results of our approach compared with the state-of-theart methods in terms of generalization and acceleration. We also show that the FDG is able to train various networks, including extremely deep ones (e.g., ResNet-1202), in a decoupled fashion. Agency for Science, Technology and Research (A*STAR) Submitted/Accepted version This work was supported in part by the Science and Engineering Research Council, Agency of Science, Technology and Research, Singapore, through the National Robotics Program under Grant 1922500054. 2024-04-01T04:53:03Z 2024-04-01T04:53:03Z 2021 Journal Article Zhuang, H., Wang, Y., Liu, Q. & Lin, Z. (2021). Fully decoupled neural network learning using delayed gradients. IEEE Transactions On Neural Networks and Learning Systems, 33(10), 6013-6020. https://dx.doi.org/10.1109/TNNLS.2021.3069883 2162-237X https://hdl.handle.net/10356/174476 10.1109/TNNLS.2021.3069883 10 33 6013 6020 en NRP-1922500054. IEEE Transactions on Neural Networks and Learning Systems © 2021 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/TNNLS.2021.3069883. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Decoupled learning
Delayed gradients
spellingShingle Computer and Information Science
Decoupled learning
Delayed gradients
Zhuang, Huiping
Wang, Yi
Liu, Qinglai
Lin, Zhiping
Fully decoupled neural network learning using delayed gradients
description Training neural networks with back-propagation (BP) requires a sequential passing of activations and gradients. This has been recognized as the lockings (i.e., the forward, backward, and update lockings) among modules (each module contains a stack of layers) inherited from the BP. In this paper, we propose a fully decoupled training scheme using delayed gradients (FDG) to break all these lockings. The FDG splits a neural network into multiple modules and trains them independently and asynchronously using different workers (e.g., GPUs). We also introduce a gradient shrinking process to reduce the stale gradient effect caused by the delayed gradients. Our theoretical proofs show that the FDG can converge to critical points under certain conditions. Experiments are conducted by training deep convolutional neural networks to perform classification tasks on several benchmark datasets. These experiments show comparable or better results of our approach compared with the state-of-theart methods in terms of generalization and acceleration. We also show that the FDG is able to train various networks, including extremely deep ones (e.g., ResNet-1202), in a decoupled fashion.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Zhuang, Huiping
Wang, Yi
Liu, Qinglai
Lin, Zhiping
format Article
author Zhuang, Huiping
Wang, Yi
Liu, Qinglai
Lin, Zhiping
author_sort Zhuang, Huiping
title Fully decoupled neural network learning using delayed gradients
title_short Fully decoupled neural network learning using delayed gradients
title_full Fully decoupled neural network learning using delayed gradients
title_fullStr Fully decoupled neural network learning using delayed gradients
title_full_unstemmed Fully decoupled neural network learning using delayed gradients
title_sort fully decoupled neural network learning using delayed gradients
publishDate 2024
url https://hdl.handle.net/10356/174476
_version_ 1800916374402367488