A study on open set recognition methods

Deep learning has achieved state-of-the-art performance in many computer vision tasks, but there are still multiple challenges when applying deep learning to real-world problems. One typical challenge is that incomplete knowledge exists during the training phase, and an unknown sample may be fed int...

Full description

Saved in:
Bibliographic Details
Main Author: Sun, Xin
Other Authors: Ling Keck Voon
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/152461
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Deep learning has achieved state-of-the-art performance in many computer vision tasks, but there are still multiple challenges when applying deep learning to real-world problems. One typical challenge is that incomplete knowledge exists during the training phase, and an unknown sample may be fed into the system during the testing phase, then the network will wrongly recognize it as one of the known classes. A potential solution to address this issue is Open Set Recognition (OSR), which assumes that deficient knowledge exists during the training stage, and aims to not only classify samples of known classes but detect unknown samples during the testing stage. OSR methods are gaining more and more attention these years. Considering that information can only be extracted from known samples, to realize unknown detection, some OSR methods are based on generative models (such as Generative Adversarial Networks (GAN) or Variational Auto-Encoders (VAE)). On the other hand, some OSR methods synthesize unknown data or redistribute the prediction scores among known classes to directly calculate out prediction scores of unknown samples. Although these methods achieve state-of-the-art performance in computer vision tasks, most of them have a long running time and are not applicable to real-time industrial tasks. To have a shorter running time, we proposed an OSR method, called Discriminative Loss. We combine the proposed loss function with the Softmax loss function, which is used in most Convolutional Neural Networks (CNNs), to force learned features in different classes to be close to different centroids for Gaussian modeling. The proposed method is demonstrated on the MNIST dataset. This method is also applied to detect air leakage on pneumatic train door subsystems and achieves promising classification accuracy with a shorter running time. Although the proposed Discriminative Loss method can be successfully applied to industrial sensor data, its performance is not satisfying on more complex data (i.e., image). To achieve higher accuracy on image datasets, we propose a novel OSR method, called Conditional Probabilistic Generative Models (CPGM). The core insight of this method is to add discriminative information into the probabilistic generative models (i.e., Variational Auto-encoders (VAE) and Adversarial Auto-encoders (AAE)), such that the proposed models cannot only detect unknown samples but also classify known classes by forcing different latent features to approximate class conditional Gaussian distributions. Auto-encoders (AE) is widely used to generate latent features with desired statistical characteristics for OSR. Meanwhile, AE also can extract class-specific features through its reconstruction training strategy. This reconstruction strategy requires the network to restore the input image on pixel-level. However, this strategy is commonly over-demanding for OSR since class-specific features are generally contained in target objects, not in all pixels. To address this shortcoming, we propose a mutual information-based method with a streamlined architecture, Maximal Mutual Information Open Set Recognition (M2IOSR). The proposed M2IOSR only uses an encoder to extract class-specific features by maximizing the mutual information between the given input and its latent features across multiple scales. Meanwhile, to further reduce the open space risk, latent features are constrained to class conditional Gaussian distributions by a KL-divergence loss function. We compare the proposed three methods on six standard image datasets (MNIST, SVHN, CIFAR-10, CIFAR-100, ImageNet, and LSUN) and the air pressure data collected from a pneumatic train door subsystem. All the proposed methods do not significantly decline closed set classification accuracies while realizing OSR. For experiments on standard image datasets, CPGM significantly outperforms the baseline methods, and M2IOSR achieves state-of-the-art performance. For experiments on air pressure data, Gaussian-based models (Discriminative Loss, CPGM-VAE, and CPGM-AAE) all achieve promising results. Among them, CPGM-AAE achieves the highest F1 scores in both the extension phase and the retraction phase. It is also worth noting that compared with other OSR methods, Discriminative Loss has a minimal number of parameters, and its running time is only 0.2 milliseconds per input slower than that of the fastest method (Softmax).