Development of interpretable spiking neural network for multiclass classification
Spiking Neural Networks (SNNs) are the third generation of artificial neural networks, which process inputs asynchronously, through spikes. A spike is a discrete event in the temporal domain. This provides an additional dimension of time in SNN for processing the inputs. SNN's way of informatio...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/156352 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Spiking Neural Networks (SNNs) are the third generation of artificial neural networks, which process inputs asynchronously, through spikes. A spike is a discrete event in the temporal domain. This provides an additional dimension of time in SNN for processing the inputs. SNN's way of information processing is more biologically realistic and computationally more powerful than Analog Neural Networks (An-NNs). This paved the way for the development of SNNs to mimic the brains' information processing capability for several cognitive tasks. This thesis is aimed at addressing the gaps in the interpretation capability of SNNs, while also improving their generalization capabilities with a less computational load.
In SNNs, the weight model in between two spiking neurons modulates the amplitude of the information propagating from one neuron to the other. The majority of the developments in SNNs use a single weight model. The single weight models are adapted from long-term plasticity models in biology. In the SNN context, it means the weight does not change after the training process. However, many types of dynamic synaptic plasticity models exist in the biological neural network to modulate information propagation. In the SNN context, a dynamic synaptic plasticity model can be interpreted as a dynamic weight that changes for different input spikes even after the training process. These dynamic weights improve the computational power of the networks with asynchronous inputs (such as SNNs), as the asynchronicity of the inputs can fully utilize the dynamic nature of the weights. However, the adaptation of dynamic weight in a supervised learning framework is a bottleneck, as during the learning phase only a single weight is learned and the dynamics of the weight evoked by the input spikes are pre-defined. The usage of dynamic weights in SNNs became dormant mainly due to the heavy computational load associated with modeling the dynamics of the weight for every incoming spike.
Instead of directly adapting the dynamic weights from biology, this thesis proposes a learnable time-varying weight model that is suitable for SNNs. The time-varying weight model is a continuous function that is learned through a supervised learning framework. This time-varying weight model has the characteristics of both the long-term plasticity model and the dynamic synaptic plasticity model. The learned function does not change after the training process and it will have different weights for different input spikes. The time-varying weights utilize the asynchronicity in the inputs in a more meaningful manner and change the connectivity between the spiking neurons. This opens up an entirely new area of development of learning algorithms suitable for time-varying weight models as the learning algorithms that were developed for SNN with single weight model cannot be directly applied to train the SNN with time-varying weights.
To this end, this thesis proposes Synaptic Efficacy Function-based leaky-integrate-and-fire neuRON (SEFRON), a single output neuron binary classifier with time-varying weights. In SEFRON, the input neurons are directly connected to one output neuron via time-varying weights. The real-valued inputs are converted to spike times using the population encoding scheme (temporal encoding). A normalized spike-timing-dependent plasticity (STDP) rule is developed to train the SEFRON. The output spike time is split into two regions to determine the predicted class label for a given input. The normalized STDP rule determines a single value weight update and it is embedded in a Gaussian function to produce the time-varying weight update. The centre of the Gaussian function is the same as the input spike time. The resultant time-varying weight is equivalent to the summation of multiple amplitude-modulated Gaussian functions with their centers located at different times. Other weight models can have either positive or negative weight values, whereas the time-varying weight models can have both negative and positive weight values. This enables the time-varying weight model to encode more information in a single link. The performance of SEFRON is evaluated in terms of architecture, computational time, and accuracy. The performance study results show that SEFRON with single neuron and time-varying weight has comparable performance to that of multi-layer and multi-neuron SNN classifiers with single weight models and dynamic weight models. The high performance of SEFRON with a single neuron is attributed to the computational power of the time-varying weights.
SEFRON is limited to binary classification tasks, hence this thesis proposes to extend the usage of the time-varying weight model in SNN to handle multiclass classification problems. For multiclass classification, first, the single neuron architecture is expanded to Spiking Neural Network with time-varying weights (SNN-t). The predicted class label for a given input is determined by the output neuron that fires a spike earlier than other neurons. Encoding real-valued inputs to spike times and producing time-varying weight updates from the single value weight update follow the same procedure as in SEFRON. This thesis proposes three algorithms to train the SNN-t architecture for multiclass classification problems. First, the normalized STDP based algorithm developed for SEFRON is modified to suit to multiclass setting (Mc-SEFRON). Secondly, a meta-neuron-based learning algorithm is developed (MeST) to improve the generalization ability of SNN-t. Finally, a gradient descent-based learning algorithm is developed (GradST) to improve the generalization ability and also to handle big datasets. The error from one layer to another layer has to be heuristically calculated for Mc-SEFRON and MeST. This makes them most suitable for shallow architectures with single learnable layers. GradST is scalable to multi-layer architectures. The performance of Mc-SEFRON, MeST, and GradST are evaluated on UCI benchmark datasets. MeST has a better performance for small datasets and GradST has superior performance for big datasets. The scalability of GradST is demonstrated on MNIST, JAFFE, and CIFAR10 image datasets. On MNIST and JAFFE datasets, even though the performance of GradST is lower than SOTA, the performance of GradST is much superior to an SNN with a single weight model and the same architecture as SNN-t. A hybrid model with ResNet50 and SNN-t is built for the CIFAR10 dataset. The hybrid model slightly improves the accuracy of the baseline ResNet model and also significantly improves the robustness of the model for gradient-based adversarial attacks.
Subsequently, this thesis addresses the interpretation of the predictions made by spiking neural networks for multiclass classification. This thesis proposes a weight transformation method to transform the weighted spike response in the temporal domain to feature space to develop a Generalized Additive Model (GAM). The GAMs are inherently interpretable and are predominantly used for binary classification problems. The GAM obtained from the SNN-t is referred to as the Spiking Additive Model (SAM). In a multiclass setting, the inherent interpretability of GAMs diminishes due to the presence of multiple shape functions. This thesis also proposes a postprocessing method for multiclass GAMs to enhance the interpretability of multiclass GAMs. This postprocessing method improves the visualization of multiple shape functions to provide relative interpretation for multiclass classification problems. The performance of SAMs obtained from Mc-SEFRON, MeST, and GradST are evaluated on large UCI benchmark datasets. The difference in performance between SAM and its corresponding SNN-t classifier is very minimal, indicating the effectiveness of the weight transformation method.
Finally, the methods proposed in this thesis are applied to solve real-world credit scoring problems. The SNN-t classifiers have the lowest class bias and perform better than all the other non-interpretable classifiers, including the shallow classifiers with deep learning methods. The performance and the advantages of the SNN-t classifiers are preserved in their respective SAMs. The SAMs have superior performance compared to other interpretable classifiers. The performance of SAM can be improved by improving the performance of SNN-t. The high-performance, lower-class bias, and interpretability of SAMs make them the more favorable choice for high-stake decision-making applications. |
---|