Speeding up deep neural network training with decoupled and analytic learning

Training deep neural networks usually demands a significantly long period of time. In this thesis, we explore methods in two different areas, i.e., decoupled learning and analytic learning, in order to reduce the training time. In decoupled learning, new methods are proposed to alleviate the sequ...

Full description

Saved in:
Bibliographic Details
Main Author: Zhuang, Huiping
Other Authors: Lin Zhiping
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/153079
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-153079
record_format dspace
spelling sg-ntu-dr.10356-1530792023-07-04T17:39:17Z Speeding up deep neural network training with decoupled and analytic learning Zhuang, Huiping Lin Zhiping School of Electrical and Electronic Engineering EZPLin@ntu.edu.sg Engineering::Electrical and electronic engineering Training deep neural networks usually demands a significantly long period of time. In this thesis, we explore methods in two different areas, i.e., decoupled learning and analytic learning, in order to reduce the training time. In decoupled learning, new methods are proposed to alleviate the sequential nature of the backpropagation (BP) which accounts for the most common means of training deep neural networks. The BP requires a sequential passing of activations and gradients, which has been recognized as lockings (i.e., the forward, backward, and update lockings). These lockings impose strong synchronism among modules (a consecutive stack of layers), rendering most modules idle during training. A fully decoupled learning method using delayed gradients (FDG) is first proposed which addresses all the three lockings. The FDG improves training efficiency as a significant acceleration is achieved. Furthermore, the decoupled learning inevitably introduces asynchronism that causes gradient staleness (also known as stale gradient effect), resulting in degraded generalization performance or even divergence. An accumulated decoupled learning (ADL) is hence developed to cope with the staleness issue. The proposed ADL is proved to be effective in reducing the gradient staleness both theoretically and empirically, demonstrating an improved generalization ability compared with that of the current works which ignore the staleness. New methods are also developed in the area of analytic learning by discarding the BP entirely and training the network using analytical solutions. The analytic learning trains neural networks in an exceedingly fast fashion as the training is completed within one single epoch. There are two main challenges in this area. The first challenge lies in the difficulty of finding analytical solutions for multilayer networks. Existing methods have several limitations, such as structural constraints or requesting invertible activation functions. Here a correlation projection network (CPNet) is developed which removes the aforementioned limitations by treating the network as a combination of multiple 2-layer modules. The analytic learning of CPNet is made possible after the label information is projected into the hidden modules so that each 2-layer module can analytically solve the locally supervised learning using the least squares solutions. The other challenge is that, to implement the analytic learning, there is a possible issue of memory leak caused by matrix operations based on the entire dataset. Hence, a block-wise recursive Moore-Penrose inverse (BRMP) method is proposed which can reformulate the original analytic learning exactly into a block-wise alternative using a block-wise decomposition of Moore-Penrose inverse. The BRMP not only reduces the memory consumption while keeping its high training efficiency, but also takes care of the potential rank-deficient matrix inversion issue during the analytic learning. Doctor of Philosophy 2021-11-05T03:50:36Z 2021-11-05T03:50:36Z 2021 Thesis-Doctor of Philosophy Zhuang, H. (2021). Speeding up deep neural network training with decoupled and analytic learning. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/153079 https://hdl.handle.net/10356/153079 10.32657/10356/153079 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Zhuang, Huiping
Speeding up deep neural network training with decoupled and analytic learning
description Training deep neural networks usually demands a significantly long period of time. In this thesis, we explore methods in two different areas, i.e., decoupled learning and analytic learning, in order to reduce the training time. In decoupled learning, new methods are proposed to alleviate the sequential nature of the backpropagation (BP) which accounts for the most common means of training deep neural networks. The BP requires a sequential passing of activations and gradients, which has been recognized as lockings (i.e., the forward, backward, and update lockings). These lockings impose strong synchronism among modules (a consecutive stack of layers), rendering most modules idle during training. A fully decoupled learning method using delayed gradients (FDG) is first proposed which addresses all the three lockings. The FDG improves training efficiency as a significant acceleration is achieved. Furthermore, the decoupled learning inevitably introduces asynchronism that causes gradient staleness (also known as stale gradient effect), resulting in degraded generalization performance or even divergence. An accumulated decoupled learning (ADL) is hence developed to cope with the staleness issue. The proposed ADL is proved to be effective in reducing the gradient staleness both theoretically and empirically, demonstrating an improved generalization ability compared with that of the current works which ignore the staleness. New methods are also developed in the area of analytic learning by discarding the BP entirely and training the network using analytical solutions. The analytic learning trains neural networks in an exceedingly fast fashion as the training is completed within one single epoch. There are two main challenges in this area. The first challenge lies in the difficulty of finding analytical solutions for multilayer networks. Existing methods have several limitations, such as structural constraints or requesting invertible activation functions. Here a correlation projection network (CPNet) is developed which removes the aforementioned limitations by treating the network as a combination of multiple 2-layer modules. The analytic learning of CPNet is made possible after the label information is projected into the hidden modules so that each 2-layer module can analytically solve the locally supervised learning using the least squares solutions. The other challenge is that, to implement the analytic learning, there is a possible issue of memory leak caused by matrix operations based on the entire dataset. Hence, a block-wise recursive Moore-Penrose inverse (BRMP) method is proposed which can reformulate the original analytic learning exactly into a block-wise alternative using a block-wise decomposition of Moore-Penrose inverse. The BRMP not only reduces the memory consumption while keeping its high training efficiency, but also takes care of the potential rank-deficient matrix inversion issue during the analytic learning.
author2 Lin Zhiping
author_facet Lin Zhiping
Zhuang, Huiping
format Thesis-Doctor of Philosophy
author Zhuang, Huiping
author_sort Zhuang, Huiping
title Speeding up deep neural network training with decoupled and analytic learning
title_short Speeding up deep neural network training with decoupled and analytic learning
title_full Speeding up deep neural network training with decoupled and analytic learning
title_fullStr Speeding up deep neural network training with decoupled and analytic learning
title_full_unstemmed Speeding up deep neural network training with decoupled and analytic learning
title_sort speeding up deep neural network training with decoupled and analytic learning
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/153079
_version_ 1772829102285258752