Speeding up deep neural network training with decoupled and analytic learning
Training deep neural networks usually demands a significantly long period of time. In this thesis, we explore methods in two different areas, i.e., decoupled learning and analytic learning, in order to reduce the training time. In decoupled learning, new methods are proposed to alleviate the sequ...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/153079 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Training deep neural networks usually demands a significantly long period of time. In this thesis, we explore methods in two different areas, i.e., decoupled learning and analytic learning, in order to reduce the training time.
In decoupled learning, new methods are proposed to alleviate the sequential nature of the backpropagation (BP) which accounts for the most common means of training deep neural networks. The BP requires a sequential passing of activations and gradients, which has been recognized as lockings (i.e., the forward, backward, and update lockings). These lockings impose strong synchronism among modules (a consecutive stack of layers), rendering most modules idle during training. A fully decoupled learning method using delayed gradients (FDG) is first proposed which addresses all the three lockings. The FDG improves training efficiency as a significant acceleration is achieved. Furthermore, the decoupled learning inevitably introduces asynchronism that causes gradient staleness (also known as stale gradient effect), resulting in degraded generalization performance or even divergence. An accumulated decoupled learning (ADL) is hence developed to cope with the staleness issue. The proposed ADL is proved to be effective in reducing the gradient staleness both theoretically and empirically, demonstrating an improved generalization ability compared with that of the current works which ignore the staleness.
New methods are also developed in the area of analytic learning by discarding the BP entirely and training the network using analytical solutions. The analytic learning trains neural networks in an exceedingly fast fashion as the training is completed within one single epoch. There are two main challenges in this area. The first challenge lies in the difficulty of finding analytical solutions for multilayer networks. Existing methods have several limitations, such as structural constraints or requesting invertible activation functions. Here a correlation projection network (CPNet) is developed which removes the aforementioned limitations by treating the network as a combination of multiple 2-layer modules. The analytic learning of CPNet is made possible after the label information is projected into the hidden modules so that each 2-layer module can analytically solve the locally supervised learning using the least squares solutions. The other challenge is that, to implement the analytic learning, there is a possible issue of memory leak caused by matrix operations based on the entire dataset. Hence, a block-wise recursive Moore-Penrose inverse (BRMP) method is proposed which can reformulate the original analytic learning exactly into a block-wise alternative using a block-wise decomposition of Moore-Penrose inverse. The BRMP not only reduces the memory consumption while keeping its high training efficiency, but also takes care of the potential rank-deficient matrix inversion issue during the analytic learning. |
---|