Speeding up deep neural network training with decoupled and analytic learning

Training deep neural networks usually demands a significantly long period of time. In this thesis, we explore methods in two different areas, i.e., decoupled learning and analytic learning, in order to reduce the training time. In decoupled learning, new methods are proposed to alleviate the sequ...

全面介紹

Saved in:

書目詳細資料
主要作者:	Zhuang, Huiping
其他作者:	Lin Zhiping
格式:	Thesis-Doctor of Philosophy
語言:	English
出版:	Nanyang Technological University 2021
主題:	Engineering::Electrical and electronic engineering
在線閱讀:	https://hdl.handle.net/10356/153079
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

id	sg-ntu-dr.10356-153079
record_format	dspace
spelling	sg-ntu-dr.10356-1530792023-07-04T17:39:17Z Speeding up deep neural network training with decoupled and analytic learning Zhuang, Huiping Lin Zhiping School of Electrical and Electronic Engineering EZPLin@ntu.edu.sg Engineering::Electrical and electronic engineering Training deep neural networks usually demands a significantly long period of time. In this thesis, we explore methods in two different areas, i.e., decoupled learning and analytic learning, in order to reduce the training time. In decoupled learning, new methods are proposed to alleviate the sequential nature of the backpropagation (BP) which accounts for the most common means of training deep neural networks. The BP requires a sequential passing of activations and gradients, which has been recognized as lockings (i.e., the forward, backward, and update lockings). These lockings impose strong synchronism among modules (a consecutive stack of layers), rendering most modules idle during training. A fully decoupled learning method using delayed gradients (FDG) is first proposed which addresses all the three lockings. The FDG improves training efficiency as a significant acceleration is achieved. Furthermore, the decoupled learning inevitably introduces asynchronism that causes gradient staleness (also known as stale gradient effect), resulting in degraded generalization performance or even divergence. An accumulated decoupled learning (ADL) is hence developed to cope with the staleness issue. The proposed ADL is proved to be effective in reducing the gradient staleness both theoretically and empirically, demonstrating an improved generalization ability compared with that of the current works which ignore the staleness. New methods are also developed in the area of analytic learning by discarding the BP entirely and training the network using analytical solutions. The analytic learning trains neural networks in an exceedingly fast fashion as the training is completed within one single epoch. There are two main challenges in this area. The first challenge lies in the difficulty of finding analytical solutions for multilayer networks. Existing methods have several limitations, such as structural constraints or requesting invertible activation functions. Here a correlation projection network (CPNet) is developed which removes the aforementioned limitations by treating the network as a combination of multiple 2-layer modules. The analytic learning of CPNet is made possible after the label information is projected into the hidden modules so that each 2-layer module can analytically solve the locally supervised learning using the least squares solutions. The other challenge is that, to implement the analytic learning, there is a possible issue of memory leak caused by matrix operations based on the entire dataset. Hence, a block-wise recursive Moore-Penrose inverse (BRMP) method is proposed which can reformulate the original analytic learning exactly into a block-wise alternative using a block-wise decomposition of Moore-Penrose inverse. The BRMP not only reduces the memory consumption while keeping its high training efficiency, but also takes care of the potential rank-deficient matrix inversion issue during the analytic learning. Doctor of Philosophy 2021-11-05T03:50:36Z 2021-11-05T03:50:36Z 2021 Thesis-Doctor of Philosophy Zhuang, H. (2021). Speeding up deep neural network training with decoupled and analytic learning. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/153079 https://hdl.handle.net/10356/153079 10.32657/10356/153079 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Electrical and electronic engineering Zhuang, Huiping Speeding up deep neural network training with decoupled and analytic learning
description	Training deep neural networks usually demands a significantly long period of time. In this thesis, we explore methods in two different areas, i.e., decoupled learning and analytic learning, in order to reduce the training time. In decoupled learning, new methods are proposed to alleviate the sequential nature of the backpropagation (BP) which accounts for the most common means of training deep neural networks. The BP requires a sequential passing of activations and gradients, which has been recognized as lockings (i.e., the forward, backward, and update lockings). These lockings impose strong synchronism among modules (a consecutive stack of layers), rendering most modules idle during training. A fully decoupled learning method using delayed gradients (FDG) is first proposed which addresses all the three lockings. The FDG improves training efficiency as a significant acceleration is achieved. Furthermore, the decoupled learning inevitably introduces asynchronism that causes gradient staleness (also known as stale gradient effect), resulting in degraded generalization performance or even divergence. An accumulated decoupled learning (ADL) is hence developed to cope with the staleness issue. The proposed ADL is proved to be effective in reducing the gradient staleness both theoretically and empirically, demonstrating an improved generalization ability compared with that of the current works which ignore the staleness. New methods are also developed in the area of analytic learning by discarding the BP entirely and training the network using analytical solutions. The analytic learning trains neural networks in an exceedingly fast fashion as the training is completed within one single epoch. There are two main challenges in this area. The first challenge lies in the difficulty of finding analytical solutions for multilayer networks. Existing methods have several limitations, such as structural constraints or requesting invertible activation functions. Here a correlation projection network (CPNet) is developed which removes the aforementioned limitations by treating the network as a combination of multiple 2-layer modules. The analytic learning of CPNet is made possible after the label information is projected into the hidden modules so that each 2-layer module can analytically solve the locally supervised learning using the least squares solutions. The other challenge is that, to implement the analytic learning, there is a possible issue of memory leak caused by matrix operations based on the entire dataset. Hence, a block-wise recursive Moore-Penrose inverse (BRMP) method is proposed which can reformulate the original analytic learning exactly into a block-wise alternative using a block-wise decomposition of Moore-Penrose inverse. The BRMP not only reduces the memory consumption while keeping its high training efficiency, but also takes care of the potential rank-deficient matrix inversion issue during the analytic learning.
author2	Lin Zhiping
author_facet	Lin Zhiping Zhuang, Huiping
format	Thesis-Doctor of Philosophy
author	Zhuang, Huiping
author_sort	Zhuang, Huiping
title	Speeding up deep neural network training with decoupled and analytic learning
title_short	Speeding up deep neural network training with decoupled and analytic learning
title_full	Speeding up deep neural network training with decoupled and analytic learning
title_fullStr	Speeding up deep neural network training with decoupled and analytic learning
title_full_unstemmed	Speeding up deep neural network training with decoupled and analytic learning
title_sort	speeding up deep neural network training with decoupled and analytic learning
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/153079
_version_	1772829102285258752

Speeding up deep neural network training with decoupled and analytic learning

相似書籍