Image and video super-resolution in the wild

With the increasing need for high-resolution content, there is a need to develop super-resolution techniques that improve the resolution of images and videos captured from non-professional imaging devices. Researchers have made incessant efforts to improve the resolution of images and videos to melio...

Full description

Saved in:
Bibliographic Details
Main Author: Chan, Kelvin Cheuk Kit
Other Authors: Chen Change Loy
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/160140
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-160140
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Chan, Kelvin Cheuk Kit
Image and video super-resolution in the wild
description With the increasing need for high-resolution content, there is a need to develop super-resolution techniques that improve the resolution of images and videos captured from non-professional imaging devices. Researchers have made incessant efforts to improve the resolution of images and videos to meliorate user experience and enhance performance in downstream tasks. However, most existing approaches focus on designing an image-to-image mapping, failing in employing auxiliary information readily available in reality. As a result, such methods often possess suboptimal effectiveness and efficiency owing to inadequate information aggregation and large network complexity. In addition, it remains nontrivial to generalize to uncontrolled scenes, whose degradations could be complex, diverse, and unknown. This thesis proposes solutions for effective image and video super-resolution and generalization to real-world degradations through exploiting generative priors and temporal information. The thesis first demonstrates that pre-trained Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR). Our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN. GLEAN can be easily incorporated in a simple encoder-bank-decoder architecture with multi-resolution skip connections. Images upscaled by GLEAN show clear improvements in terms of fidelity and texture faithfulness compared to existing methods. Second, we study the underlying mechanism of deformable alignment, which shows compelling performance in aligning multiple frames for video super-resolution. Specifically, we show that deformable convolution can be decomposed into a combination of spatial warping and convolution, revealing the commonality of deformable alignment and flow-based alignment in formulation, but with a key difference in their offset diversity. Based on our observations, we propose an offset-fidelity loss that guides the offset learning with optical flow. Experiments show that our loss successfully avoids the overflow of offsets and alleviates the instability problem of deformable alignment. Third, we reconsider some most essential components for video super-resolution guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms. We conduct a systematic analysis to explain how such gain can be obtained and discuss the pitfalls. We further show the extensibility of BasicVSR by presenting IconVSR and BasicVSR++. IconVSR contains an information-refill mechanism to alleviate the error accumulation problem, and a coupled propagation to faciiliate information flow during propagation. BasicVSR++ further enhances propagation and alignment with second-order grid propagation and flow-guided deformable alignment. Our BasicVSR series significantly outperforms existing works in both efficiency and output quality. Fourth, we provide solutions to tackle the unique challenges in real-world video super-resolution in inference and training, induced by the diversity and complexity of degradations. First, we introduce an image pre-cleaning stage to reduce noises and artifacts prior to propagation, substantially improving the output quality. Second, we provide analysis and solutions to the problems resulting from the increased computational burden in the task. In addition, to facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences containing rich textures and patterns. Our dataset can serve as a common ground for benchmarking.
author2 Chen Change Loy
author_facet Chen Change Loy
Chan, Kelvin Cheuk Kit
format Thesis-Doctor of Philosophy
author Chan, Kelvin Cheuk Kit
author_sort Chan, Kelvin Cheuk Kit
title Image and video super-resolution in the wild
title_short Image and video super-resolution in the wild
title_full Image and video super-resolution in the wild
title_fullStr Image and video super-resolution in the wild
title_full_unstemmed Image and video super-resolution in the wild
title_sort image and video super-resolution in the wild
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/160140
_version_ 1743119565881081856
spelling sg-ntu-dr.10356-1601402022-08-01T05:07:18Z Image and video super-resolution in the wild Chan, Kelvin Cheuk Kit Chen Change Loy School of Computer Science and Engineering ccloy@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision With the increasing need for high-resolution content, there is a need to develop super-resolution techniques that improve the resolution of images and videos captured from non-professional imaging devices. Researchers have made incessant efforts to improve the resolution of images and videos to meliorate user experience and enhance performance in downstream tasks. However, most existing approaches focus on designing an image-to-image mapping, failing in employing auxiliary information readily available in reality. As a result, such methods often possess suboptimal effectiveness and efficiency owing to inadequate information aggregation and large network complexity. In addition, it remains nontrivial to generalize to uncontrolled scenes, whose degradations could be complex, diverse, and unknown. This thesis proposes solutions for effective image and video super-resolution and generalization to real-world degradations through exploiting generative priors and temporal information. The thesis first demonstrates that pre-trained Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR). Our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN. GLEAN can be easily incorporated in a simple encoder-bank-decoder architecture with multi-resolution skip connections. Images upscaled by GLEAN show clear improvements in terms of fidelity and texture faithfulness compared to existing methods. Second, we study the underlying mechanism of deformable alignment, which shows compelling performance in aligning multiple frames for video super-resolution. Specifically, we show that deformable convolution can be decomposed into a combination of spatial warping and convolution, revealing the commonality of deformable alignment and flow-based alignment in formulation, but with a key difference in their offset diversity. Based on our observations, we propose an offset-fidelity loss that guides the offset learning with optical flow. Experiments show that our loss successfully avoids the overflow of offsets and alleviates the instability problem of deformable alignment. Third, we reconsider some most essential components for video super-resolution guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms. We conduct a systematic analysis to explain how such gain can be obtained and discuss the pitfalls. We further show the extensibility of BasicVSR by presenting IconVSR and BasicVSR++. IconVSR contains an information-refill mechanism to alleviate the error accumulation problem, and a coupled propagation to faciiliate information flow during propagation. BasicVSR++ further enhances propagation and alignment with second-order grid propagation and flow-guided deformable alignment. Our BasicVSR series significantly outperforms existing works in both efficiency and output quality. Fourth, we provide solutions to tackle the unique challenges in real-world video super-resolution in inference and training, induced by the diversity and complexity of degradations. First, we introduce an image pre-cleaning stage to reduce noises and artifacts prior to propagation, substantially improving the output quality. Second, we provide analysis and solutions to the problems resulting from the increased computational burden in the task. In addition, to facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences containing rich textures and patterns. Our dataset can serve as a common ground for benchmarking. Doctor of Philosophy 2022-07-14T00:54:06Z 2022-07-14T00:54:06Z 2022 Thesis-Doctor of Philosophy Chan, K. C. K. (2022). Image and video super-resolution in the wild. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/160140 https://hdl.handle.net/10356/160140 10.32657/10356/160140 en I1901E0052 2018-T1-002-056 NTU SUG grant NTU NAP This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University