Image restoration and representation with deep generative models
A fundamental challenge in the realm of computer vision lies in accurately modeling/characterizing image distributions. For example, in high-level vision tasks, better representations of images in the latent space can significantly enhance downstream processes such as image classification and segmen...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182543 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | A fundamental challenge in the realm of computer vision lies in accurately modeling/characterizing image distributions. For example, in high-level vision tasks, better representations of images in the latent space can significantly enhance downstream processes such as image classification and segmentation. Similarly, in image restoration, a more accurate model for the distributions of clean images conditioned on degraded images can yield results with better perceptual quality. However, modeling these distributions is highly challenging due to the high dimensionality of both images and their latent codes.
In a relatively parallel research avenue, deep generative models have made remarkable strides, aiming to implicitly or explicitly model image distributions. These models offer potent tools for handling intricate distributions. However, directly applying these models is unfeasible and leads to suboptimal performances due to different domain priors and task formulations. The challenge of seamlessly and effectively integrating generative models into image restoration and representation tasks, along with incorporating task-specific priors, remains an open area for further exploration.
This thesis focuses on exploring the potential of deep generative models in image restoration and representation tasks, spanning from high-level vision tasks, i.e., image classification, to low-level vision tasks, i.e., image compression and restoration. Firstly, for the image classification task, we introduce a novel variational inference framework to obtain a latent representation with enhanced generalization ability. We implicitly model the posterior distributions of images given their latent codes using a generative adversarial network to disentangle domain-invariant features from the provided training data.
Subsequently, instead of implicitly modeling the posterior distributions, we explore the possibility of explicitly modeling them. Utilizing the power of Normalizing Flow, a generative model capable of obtaining the exact likelihood of a given sample, we apply it to the task of low-light image enhancement (LLIE), considering its inductive bias. While the flow-based model achieves promising results, the performance is constrained by the intrinsic limitations in the design of normalizing flow. Therefore, we delve into seamlessly integrating the prior knowledge of the LLIE task in raw image space into a diffusion framework to overcome the limitations in model designs.
Although our diffusion-based solution has promising results, it still has two main limitations: first, even with the unique advantages of enhancement in raw space, the significant storage overhead of raw images hampers its application. Consequently, in the subsequent two works, we explore joint compression by accurately modeling the latent distributions of images and leveraging the capabilities of autoregressive models to further enhance coding efficiency. Second, while the proposed work shortens the inference path from tens or hundreds to only three steps compared with commonly used diffusion models, it still requires iterative evaluations and causes inference overhead.
In the last work, we propose an acceleration strategy incorporating a distillation strategy with a novel self-consistency loss. We achieve SOTA performance in only one inference step in the super-resolution task, where the conditional distribution is relatively more complex.
In summary, the thesis makes three primary contributions. First, we showcase the effectiveness and unique advantages of generative-based image restoration and representation methods. Second, we propose diverse approaches to seamlessly integrate the capabilities of deep generative models with the domain knowledge specific to image restoration and representation tasks. To validate the effectiveness of these proposed methods, extensive experiments are conducted across multiple datasets. The experimental results unequivocally demonstrate that our methods outperform prior state-of-the-art models. The efforts and achievements presented in this thesis not only underscore the practical capabilities of image restoration and representation techniques but also provide fundamental support for future research and applications within the industry. |
---|