Disentangling latent space of variational autoencoder with distribution dependent guarantees for out-of-distribution detection and reasoning
Cyber-physical systems (CPS) have diverse applications, especially in a safety-critical setting, such as autonomous cars (AV). In safety-critical systems, any mistake can lead to non-compensable results, such as losing individuals. Therefore, ensuring the safety of such systems is vital. Many saf...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172959 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-172959 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Rahiminasab Zahra Reza (Zahra Rahiminasab) Disentangling latent space of variational autoencoder with distribution dependent guarantees for out-of-distribution detection and reasoning |
description |
Cyber-physical systems (CPS) have diverse applications, especially in a safety-critical
setting, such as autonomous cars (AV). In safety-critical systems, any mistake
can lead to non-compensable results, such as losing individuals. Therefore,
ensuring the safety of such systems is vital. Many safety-critical CPS use machine
learning (ML) models to accomplish their goals, such as in object detection, image
segmentation, etc. As a result, there is a need for approaches to examine the safety
of generated predictions by such ML models.
One of the presented problems in examining the safety of ML models is the outof-
distribution (OOD) problem. In the OOD problem, the objective is to identify
test samples that are drawn from distributions different from the distribution of
training samples. For example, in an object detection application, if the ML model
is trained with day images and the test sample is a night image, the test sample
should be identified as OOD. Solving the OOD problem itself can be divided into
two subproblems: OOD detection and OOD reasoning. In OOD detection, the goal
is to identify whether or not the sample is OOD. In contrast, in OOD reasoning,
the objective is to explain the OOD behavior.
Approaches based on one-class classifiers have poor performance on real-world
datasets with multi-label data and overlapping partitions. For example, in the
nuScenes dataset, each image has multiple labels such as pedestrian presence,
weather, etc. An image belonging to the low rain intensity partition can also
belong to the no-pedestrian partition. As a result, in our first attempt, we create
an ensemble of β-Variational autoencoders (EBVAE) as OOD detectors that
can be trained with multi-label data. A β-variational autoencoder (β-VAE) model
maps each input to lower dimensional latent representation and reconstructs the
image based on learned representation. Each β-VAE in the ensemble learns a representation
corresponding to one generative factor. Generative factors are critical
factors in the image that are necessary for image reconstruction. This thesis focuses
on meteorological (such as rain intensity) and background generative factors.
Addressing the OOD problem for these factors is challenging as these factors affect
all the pixels of the image and can be dependent. In each β-VAE, we identify the
most sensitive representation dimension for each generative factor.
Disentanglement is establishing one-to-many maps between generative factors and
their representative latent dimensions. Disentangling the latent space of VAE or
its variants is a key step for OOD reasoning when we use one VAE to learn known
generative factors corresponding to the meteorological features and background of
an image. Disentangling the latent space of VAE is only possible with bias on the
model or supervision for data. In our second attempt to resolve the OOD detection
and reasoning problem for multi-label data, we use one VAE to decrease OOD
inference overhead. We enforce bias on the OOD model through hyperparameter
tuning to disentangle generative factors. We call obtained model hyperparameter
based disentangled β-VAE (HPVAE). We use change point detection approaches
to consider the effect of time dependency between data samples on OOD detection
and reasoning.
HPVAE based solution generates one-to-many maps between generative factors
and their corresponding latent dimensions that can differ during training and inference
time. As a result, in our third effort to solve the OOD detection and reasoning
problem, we augment disentanglement to the training process by adding disentanglement
constraints as regularization terms to the loss function. We use matchpairing
weak supervision for training the VAE with disentangled latent space. In
the match-pairing setting, samples are divided into groups in which samples from
the same group have the same value or range of values for specific generative factors.
Achieving total disentanglement, even with supervision, is impossible in practice
due to the presence of unknown generative factors, dependencies between different
generative factors, etc. The disentanglement constraints are formed by fuzzy
logic since using fuzzy logic helps to formalize partial disentanglement. We call the
trained model by this approach weakly supervised logic variational autoencoder
(WDLVAE).
Finally, we propose the disentangled distilled VAE (DDV) to reduce the model size
while preserving disentanglement properties. The motivation behind introducing
this framework is to reduce required resources when we deploy OOD reasoners
and detectors on resource-constrained devices. For model compression, we use
student-teacher knowledge distillation. To ensure the disentanglement is preserved
during model compression, we define the problem as a constrained optimization
problem with disentanglement constraints. To provide a theoretical guarantee for
disentanglement during distillation, we analyze the optimality of obtained solutions
and use generalization bounds. We also evaluate our approach empirically by
deploying the compressed model on a resource-constrained device. |
author2 |
Arvind Easwaran |
author_facet |
Arvind Easwaran Rahiminasab Zahra Reza (Zahra Rahiminasab) |
format |
Thesis-Doctor of Philosophy |
author |
Rahiminasab Zahra Reza (Zahra Rahiminasab) |
author_sort |
Rahiminasab Zahra Reza (Zahra Rahiminasab) |
title |
Disentangling latent space of variational autoencoder with distribution dependent guarantees for out-of-distribution detection and reasoning |
title_short |
Disentangling latent space of variational autoencoder with distribution dependent guarantees for out-of-distribution detection and reasoning |
title_full |
Disentangling latent space of variational autoencoder with distribution dependent guarantees for out-of-distribution detection and reasoning |
title_fullStr |
Disentangling latent space of variational autoencoder with distribution dependent guarantees for out-of-distribution detection and reasoning |
title_full_unstemmed |
Disentangling latent space of variational autoencoder with distribution dependent guarantees for out-of-distribution detection and reasoning |
title_sort |
disentangling latent space of variational autoencoder with distribution dependent guarantees for out-of-distribution detection and reasoning |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/172959 |
_version_ |
1789968686952480768 |
spelling |
sg-ntu-dr.10356-1729592024-02-01T09:53:44Z Disentangling latent space of variational autoencoder with distribution dependent guarantees for out-of-distribution detection and reasoning Rahiminasab Zahra Reza (Zahra Rahiminasab) Arvind Easwaran School of Computer Science and Engineering arvinde@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Cyber-physical systems (CPS) have diverse applications, especially in a safety-critical setting, such as autonomous cars (AV). In safety-critical systems, any mistake can lead to non-compensable results, such as losing individuals. Therefore, ensuring the safety of such systems is vital. Many safety-critical CPS use machine learning (ML) models to accomplish their goals, such as in object detection, image segmentation, etc. As a result, there is a need for approaches to examine the safety of generated predictions by such ML models. One of the presented problems in examining the safety of ML models is the outof- distribution (OOD) problem. In the OOD problem, the objective is to identify test samples that are drawn from distributions different from the distribution of training samples. For example, in an object detection application, if the ML model is trained with day images and the test sample is a night image, the test sample should be identified as OOD. Solving the OOD problem itself can be divided into two subproblems: OOD detection and OOD reasoning. In OOD detection, the goal is to identify whether or not the sample is OOD. In contrast, in OOD reasoning, the objective is to explain the OOD behavior. Approaches based on one-class classifiers have poor performance on real-world datasets with multi-label data and overlapping partitions. For example, in the nuScenes dataset, each image has multiple labels such as pedestrian presence, weather, etc. An image belonging to the low rain intensity partition can also belong to the no-pedestrian partition. As a result, in our first attempt, we create an ensemble of β-Variational autoencoders (EBVAE) as OOD detectors that can be trained with multi-label data. A β-variational autoencoder (β-VAE) model maps each input to lower dimensional latent representation and reconstructs the image based on learned representation. Each β-VAE in the ensemble learns a representation corresponding to one generative factor. Generative factors are critical factors in the image that are necessary for image reconstruction. This thesis focuses on meteorological (such as rain intensity) and background generative factors. Addressing the OOD problem for these factors is challenging as these factors affect all the pixels of the image and can be dependent. In each β-VAE, we identify the most sensitive representation dimension for each generative factor. Disentanglement is establishing one-to-many maps between generative factors and their representative latent dimensions. Disentangling the latent space of VAE or its variants is a key step for OOD reasoning when we use one VAE to learn known generative factors corresponding to the meteorological features and background of an image. Disentangling the latent space of VAE is only possible with bias on the model or supervision for data. In our second attempt to resolve the OOD detection and reasoning problem for multi-label data, we use one VAE to decrease OOD inference overhead. We enforce bias on the OOD model through hyperparameter tuning to disentangle generative factors. We call obtained model hyperparameter based disentangled β-VAE (HPVAE). We use change point detection approaches to consider the effect of time dependency between data samples on OOD detection and reasoning. HPVAE based solution generates one-to-many maps between generative factors and their corresponding latent dimensions that can differ during training and inference time. As a result, in our third effort to solve the OOD detection and reasoning problem, we augment disentanglement to the training process by adding disentanglement constraints as regularization terms to the loss function. We use matchpairing weak supervision for training the VAE with disentangled latent space. In the match-pairing setting, samples are divided into groups in which samples from the same group have the same value or range of values for specific generative factors. Achieving total disentanglement, even with supervision, is impossible in practice due to the presence of unknown generative factors, dependencies between different generative factors, etc. The disentanglement constraints are formed by fuzzy logic since using fuzzy logic helps to formalize partial disentanglement. We call the trained model by this approach weakly supervised logic variational autoencoder (WDLVAE). Finally, we propose the disentangled distilled VAE (DDV) to reduce the model size while preserving disentanglement properties. The motivation behind introducing this framework is to reduce required resources when we deploy OOD reasoners and detectors on resource-constrained devices. For model compression, we use student-teacher knowledge distillation. To ensure the disentanglement is preserved during model compression, we define the problem as a constrained optimization problem with disentanglement constraints. To provide a theoretical guarantee for disentanglement during distillation, we analyze the optimality of obtained solutions and use generalization bounds. We also evaluate our approach empirically by deploying the compressed model on a resource-constrained device. Doctor of Philosophy 2024-01-08T05:31:17Z 2024-01-08T05:31:17Z 2023 Thesis-Doctor of Philosophy Rahiminasab Zahra Reza (Zahra Rahiminasab) (2023). Disentangling latent space of variational autoencoder with distribution dependent guarantees for out-of-distribution detection and reasoning. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/172959 https://hdl.handle.net/10356/172959 10.32657/10356/172959 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |