Autonomous deep learning for continual learning in complex data stream environment
The last decade has seen growing attention to the processing of infinite data sequences that are quickly generated in an online fashion. These data can be in the form of structured data or unstructured data. In this situation, a continual learning algorithm is required to continually learn and craft...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/154462 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The last decade has seen growing attention to the processing of infinite data sequences that are quickly generated in an online fashion. These data can be in the form of structured data or unstructured data. In this situation, a continual learning algorithm is required to continually learn and craft important knowledge from the data, hence offering positive forward transfer It refers to the impact that learning a task has on future task performance gem2017paz. A model having a high positive forward transfer is good at utilizing the past knowledge to improve its prediction in the incoming task., as well as being able to memorize previously seen tasks. That is, successfully circumventing the catastrophic forgetting problem The catastrophic forgetting problem is defined as a phenomenon where the performance of the old tasks dramatically decreases with the presence of new tasks. To further achieve better performance in a complex data stream environment, a continual learning algorithm should also be able to handle two major challenges, i.e., concept changes and label availability. The former problem causes a model trained using previously seen data batch being out-of-date. The latter challenge indicates that the labeling process should be accomplished and is subject to access to the ground truth. Further, there may exist a semi-supervised or unsupervised situation where the labels are very less or never available, respectively. These challenges demand an algorithm that is capable of adapting to concept changes with or without labels. Separately, a deep neural network (DNN) has shown promising results in processing any form of data. However, the static and offline nature of DNN hinders its implementation to address the evolving characteristics of data streams.
The goal of this research is to present step-by-step solutions in circumventing the data stream problems using a flexible DNN which is able to incrementally increase or decrease its capacity in respect to problem complexity. The first contribution is presented in the third chapter. In this chapter, the problem of how to incrementally increase the network capacity in respect to problem complexity is handled. An incremental learning algorithm of DNNs for evolving data streams is proposed, namely Autonomous Deep Learning (ADL). It characterizes a flexible network structure where its hidden nodes and hidden layers can be incrementally constructed from scratch. The network significance (NS) formula derived from bias-variance trade-off The bias-variance tradeoff states that as model complexity rises, bias decreases but variance increases, resulting in a U-shaped test error curve is utilized to control the network complexity. This formula works by monitoring the possible underfitting and overfitting situation of a DNN. To further identify the loss of generalization power, a Drift Detection Scenario (DDS) is carried out to signal a real drift situation. This triggers the construction of a new hidden layer. The concept of different depth network structure is put forward to specifically combat the catastrophic forgetting problem because of hidden layer evolution. This network structure enables ADL to retain the constructed knowledge in the previous layers and thus maintaining the predictive performance whenever a new layer is added.
In the fourth chapter, we explore the strategy to expand the capacity of MLP structure in respect to the problem complexity. We formalize this method as a Neural Network with Dynamically Evolved Capacity (NADINE). The main bottleneck to evolve an MLP structure is the loss of performance whenever a new hidden layer is added. We called this problem "the catastrophic forgetting induced by structural evolution". The major cause of this is the random initialization of the newly added layer. This problem is mitigated by using a sample and replay mechanism. It is understood that retaining samples and retraining the model should be avoided in the context of data stream learning. Because of that reason, a method to collect important samples, namely adaptive memory, from the streaming data is developed. The collected samples are then used to retrain the network whenever there is a real drift confirmed by DDS. Because every hidden layer is constructed in different concepts, it is important to govern the amount of update in each hidden layer according to its relevance to the output. The soft-forgetting strategy is proposed to handle this objective where it translates the correlation of each hidden layer and output to independent learning rates for every hidden layer.
In the fifth chapter, we revisit the concept of denoising autoencoder (DAE) as a generative training mechanism that addresses the random initialization problem of DNNs. From the viewpoint of continual learning in the streaming environment, it encourages DNNs to adapt to concept changes while waiting for the labeling process. Being motivated by these facts, we proposed a deep evolving denoising autoencoder (DEVDAN). The main feature of DEVDAN lies in the coupled-generative-discriminative-learning phase which enables DEVDAN to exploit both labeled and unlabeled samples. The coupled learning phase is executed alternately whenever a data batch comes creating a continual learning cycle. The evolving trait of DEVDAN is carried out in both phases attempting to handle concept changes with or without the labels. In the generative phase, the evolving mechanism is controlled by the NS formula derived from the squared reconstruction error.
The general trend of deep learning research changed dramatically during the midst of the work on this thesis with the presentation from Yann LeCun in which he stated that unsupervised and semi-supervised learning problems are the major challenges of AI for the next decade. Semi-supervised learning behaves similarly to supervised learning in which it learns a function from input-output pairs of data. However, they have additional features to address the lack of labeled samples. That is, the algorithms can automatically generate labels from the input data. In the viewpoint of evolving data streams, this can be adopted to circumvent semi-supervised or infinitely delayed labels situation in data stream environments. The sixth chapter in this thesis presents a self-evolving deep neural network, namely Parsimonious Network (ParsNet), as a solution to the semi-supervised data stream problem. It incorporates the coupled-generative-discriminative-learning phase exploiting both labeled and unlabeled samples. Further, a self-labeling strategy with hedge (SLASH) completes the coupled-learning phase by augmenting labeled samples and minimizing the effect of self-labeling mistakes in a semi-supervised situation.
A deep convolutional network (CNN) is desired for data streams since it is able to extract better features especially in the case of unstructured data streams. Nonetheless, CNN training for data stream is hindered by its static nature and the expensive labeling cost. These demands call for an unsupervised data stream algorithm that is capable of processing both structured and unstructured data. In the seventh chapter, an Autonomous Deep Convolutional Network (ADCN) is presented. It combines the generative learning phase and self-clustering to learn better representation from the data in an unsupervised manner. The generative network involves a flexible MLP structure that is capable of evolving the hidden nodes and hidden layers on demand guided by NS formula and DDS, respectively. The self-clustering mechanism is performed in the deep embedding space of every layer while the final output is inferred by summing up the local output. Also, a latent-based regularization is employed to enable ADCN in circumventing the catastrophic forgetting problem in a continual learning environment.
From the numerical study of NADINE, it is found that model selection plays a key role in the successful application of CNN. This triggers the development of an autonomous method that can generate an appropriate structure for a given problem. Most of the research pursues two directions, that is Reinforcement Learning (RL-based) and Evolutionary Computation based (EC-based) approaches. Those approaches, however, require huge computational resources since all of them generate architecture in a trial-and-error manner. Because of this reason, a novel data-driven architecture learning method, namely the Autonomous CNN (AutoCNN), is proposed in the eighth chapter. It is capable of incrementally constructing a CNN architecture for a given problem from scratch. It differs itself from EC and RL-based methods since it is a data-driven which does not involve any trial-and-error mechanism. The CNN layer growing, filter pruning and early stopping condition are combined to realize the structural evolution. A novel Feature Separability Score (FSS) is proposed to control layer growing which is able to measure the feature separability of the CNN network. A high FSS value triggers the addition of convolutional layers attempting to obtain more meaningful representation.
The numerical studies show that the NS, DDS, and FSS formulas are successful in guiding the structural evolution on given problems exploiting both structured and unstructured data. This is achieved with a reasonable execution time which obeys the limitation given in data stream environments. The different-depth network structure and adaptive memory mechanism appear to be the key components in mitigating "the catastrophic forgetting induced by structural evolution". Separately, the coupled-generative-discriminative-learning phase and SLASH strongly form a holistic solution to execute semi-supervised learning in a non-stationary environment. Another important finding is that the combination of self-clustering mechanism and latent-based regularization delivers better accuracy compared to most baselines in dealing with unsupervised learning problems. Further, this mechanism successfully reduces the risk of catastrophic forgetting in continual learning environments. That is, being able to deliver a good performance on the previous task after learning a new task. These findings support the implementation of a flexible DNN and the proposed learning policies as a continual learning algorithm in a complex data stream environment. The proposed methods are applicable and, in some cases, have been applied to other kinds of tasks, i.e., transfer learning and control algorithm, but this thesis does not explore such application. The sources codes and raw numerical results of this thesis are available in this link https://tinyurl.com/AutonomousDL.
To conclude, this thesis presents six major contributions to the study on data stream learning using flexible DNNs as follows:
Development of a flexible different-depth DNN for data stream learning which is proficient in addressing concept drift and catastrophic forgetting due to structural evolution.
Development of an MLP structure with dynamic model capacity for data stream learning without catastrophic forgetting.
Development a flexible DNN with a coupled-generative-discriminative learning phase for exploiting both labeled and unlabeled samples in streaming environments.
Development of a flexible DNN for handling the lack of labeled samples situations in streaming environments.
Development of an evolving DNN for unsupervised and continual learning in streaming environments which can prevent catastrophic forgetting.
Development of a learning mechanism to incrementally construct a CNN structure for a given problem. |
---|