Image and video super-resolution in the wild

With the increasing need for high-resolution content, there is a need to develop super-resolution techniques that improve the resolution of images and videos captured from non-professional imaging devices. Researchers have made incessant eﬀorts to improve the resolution of images and videos to melio...

Full description

Saved in:

Bibliographic Details
Main Author:	Chan, Kelvin Cheuk Kit
Other Authors:	Chen Change Loy
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Online Access:	https://hdl.handle.net/10356/160140
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	With the increasing need for high-resolution content, there is a need to develop super-resolution techniques that improve the resolution of images and videos captured from non-professional imaging devices. Researchers have made incessant eﬀorts to improve the resolution of images and videos to meliorate user experience and enhance performance in downstream tasks. However, most existing approaches focus on designing an image-to-image mapping, failing in employing auxiliary information readily available in reality. As a result, such methods often possess suboptimal eﬀectiveness and eﬃciency owing to inadequate information aggregation and large network complexity. In addition, it remains nontrivial to generalize to uncontrolled scenes, whose degradations could be complex, diverse, and unknown. This thesis proposes solutions for eﬀective image and video super-resolution and generalization to real-world degradations through exploiting generative priors and temporal information. The thesis ﬁrst demonstrates that pre-trained Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR). Our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN. GLEAN can be easily incorporated in a simple encoder-bank-decoder architecture with multi-resolution skip connections. Images upscaled by GLEAN show clear improvements in terms of ﬁdelity and texture faithfulness compared to existing methods. Second, we study the underlying mechanism of deformable alignment, which shows compelling performance in aligning multiple frames for video super-resolution. Speciﬁcally, we show that deformable convolution can be decomposed into a combination of spatial warping and convolution, revealing the commonality of deformable alignment and ﬂow-based alignment in formulation, but with a key diﬀerence in their oﬀset diversity. Based on our observations, we propose an oﬀset-ﬁdelity loss that guides the oﬀset learning with optical ﬂow. Experiments show that our loss successfully avoids the overﬂow of oﬀsets and alleviates the instability problem of deformable alignment. Third, we reconsider some most essential components for video super-resolution guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms. We conduct a systematic analysis to explain how such gain can be obtained and discuss the pitfalls. We further show the extensibility of BasicVSR by presenting IconVSR and BasicVSR++. IconVSR contains an information-reﬁll mechanism to alleviate the error accumulation problem, and a coupled propagation to faciiliate information ﬂow during propagation. BasicVSR++ further enhances propagation and alignment with second-order grid propagation and ﬂow-guided deformable alignment. Our BasicVSR series signiﬁcantly outperforms existing works in both eﬃciency and output quality. Fourth, we provide solutions to tackle the unique challenges in real-world video super-resolution in inference and training, induced by the diversity and complexity of degradations. First, we introduce an image pre-cleaning stage to reduce noises and artifacts prior to propagation, substantially improving the output quality. Second, we provide analysis and solutions to the problems resulting from the increased computational burden in the task. In addition, to facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences containing rich textures and patterns. Our dataset can serve as a common ground for benchmarking.

Image and video super-resolution in the wild

Similar Items