Fully decoupled neural network learning using delayed gradients

Training neural networks with back-propagation (BP) requires a sequential passing of activations and gradients. This has been recognized as the lockings (i.e., the forward, backward, and update lockings) among modules (each module contains a stack of layers) inherited from the BP. In this paper,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhuang, Huiping, Wang, Yi, Liu, Qinglai, Lin, Zhiping
Other Authors:	School of Electrical and Electronic Engineering
Format:	Article
Language:	English
Published:	2024
Subjects:	Computer and Information Science Decoupled learning Delayed gradients
Online Access:	https://hdl.handle.net/10356/174476
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-174476
record_format	dspace
spelling	sg-ntu-dr.10356-1744762024-04-05T15:41:28Z Fully decoupled neural network learning using delayed gradients Zhuang, Huiping Wang, Yi Liu, Qinglai Lin, Zhiping School of Electrical and Electronic Engineering Temasek Laboratories @ NTU Computer and Information Science Decoupled learning Delayed gradients Training neural networks with back-propagation (BP) requires a sequential passing of activations and gradients. This has been recognized as the lockings (i.e., the forward, backward, and update lockings) among modules (each module contains a stack of layers) inherited from the BP. In this paper, we propose a fully decoupled training scheme using delayed gradients (FDG) to break all these lockings. The FDG splits a neural network into multiple modules and trains them independently and asynchronously using different workers (e.g., GPUs). We also introduce a gradient shrinking process to reduce the stale gradient effect caused by the delayed gradients. Our theoretical proofs show that the FDG can converge to critical points under certain conditions. Experiments are conducted by training deep convolutional neural networks to perform classification tasks on several benchmark datasets. These experiments show comparable or better results of our approach compared with the state-of-theart methods in terms of generalization and acceleration. We also show that the FDG is able to train various networks, including extremely deep ones (e.g., ResNet-1202), in a decoupled fashion. Agency for Science, Technology and Research (A*STAR) Submitted/Accepted version This work was supported in part by the Science and Engineering Research Council, Agency of Science, Technology and Research, Singapore, through the National Robotics Program under Grant 1922500054. 2024-04-01T04:53:03Z 2024-04-01T04:53:03Z 2021 Journal Article Zhuang, H., Wang, Y., Liu, Q. & Lin, Z. (2021). Fully decoupled neural network learning using delayed gradients. IEEE Transactions On Neural Networks and Learning Systems, 33(10), 6013-6020. https://dx.doi.org/10.1109/TNNLS.2021.3069883 2162-237X https://hdl.handle.net/10356/174476 10.1109/TNNLS.2021.3069883 10 33 6013 6020 en NRP-1922500054. IEEE Transactions on Neural Networks and Learning Systems © 2021 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/TNNLS.2021.3069883. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Decoupled learning Delayed gradients
spellingShingle	Computer and Information Science Decoupled learning Delayed gradients Zhuang, Huiping Wang, Yi Liu, Qinglai Lin, Zhiping Fully decoupled neural network learning using delayed gradients
description	Training neural networks with back-propagation (BP) requires a sequential passing of activations and gradients. This has been recognized as the lockings (i.e., the forward, backward, and update lockings) among modules (each module contains a stack of layers) inherited from the BP. In this paper, we propose a fully decoupled training scheme using delayed gradients (FDG) to break all these lockings. The FDG splits a neural network into multiple modules and trains them independently and asynchronously using different workers (e.g., GPUs). We also introduce a gradient shrinking process to reduce the stale gradient effect caused by the delayed gradients. Our theoretical proofs show that the FDG can converge to critical points under certain conditions. Experiments are conducted by training deep convolutional neural networks to perform classification tasks on several benchmark datasets. These experiments show comparable or better results of our approach compared with the state-of-theart methods in terms of generalization and acceleration. We also show that the FDG is able to train various networks, including extremely deep ones (e.g., ResNet-1202), in a decoupled fashion.
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Zhuang, Huiping Wang, Yi Liu, Qinglai Lin, Zhiping
format	Article
author	Zhuang, Huiping Wang, Yi Liu, Qinglai Lin, Zhiping
author_sort	Zhuang, Huiping
title	Fully decoupled neural network learning using delayed gradients
title_short	Fully decoupled neural network learning using delayed gradients
title_full	Fully decoupled neural network learning using delayed gradients
title_fullStr	Fully decoupled neural network learning using delayed gradients
title_full_unstemmed	Fully decoupled neural network learning using delayed gradients
title_sort	fully decoupled neural network learning using delayed gradients
publishDate	2024
url	https://hdl.handle.net/10356/174476
_version_	1800916374402367488

Fully decoupled neural network learning using delayed gradients

Similar Items