Image and video super-resolution in the wild

With the increasing need for high-resolution content, there is a need to develop super-resolution techniques that improve the resolution of images and videos captured from non-professional imaging devices. Researchers have made incessant eﬀorts to improve the resolution of images and videos to melio...

Full description

Saved in:

Bibliographic Details
Main Author:	Chan, Kelvin Cheuk Kit
Other Authors:	Chen Change Loy
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Online Access:	https://hdl.handle.net/10356/160140
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-160140
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Chan, Kelvin Cheuk Kit Image and video super-resolution in the wild
description	With the increasing need for high-resolution content, there is a need to develop super-resolution techniques that improve the resolution of images and videos captured from non-professional imaging devices. Researchers have made incessant eﬀorts to improve the resolution of images and videos to meliorate user experience and enhance performance in downstream tasks. However, most existing approaches focus on designing an image-to-image mapping, failing in employing auxiliary information readily available in reality. As a result, such methods often possess suboptimal eﬀectiveness and eﬃciency owing to inadequate information aggregation and large network complexity. In addition, it remains nontrivial to generalize to uncontrolled scenes, whose degradations could be complex, diverse, and unknown. This thesis proposes solutions for eﬀective image and video super-resolution and generalization to real-world degradations through exploiting generative priors and temporal information. The thesis ﬁrst demonstrates that pre-trained Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR). Our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN. GLEAN can be easily incorporated in a simple encoder-bank-decoder architecture with multi-resolution skip connections. Images upscaled by GLEAN show clear improvements in terms of ﬁdelity and texture faithfulness compared to existing methods. Second, we study the underlying mechanism of deformable alignment, which shows compelling performance in aligning multiple frames for video super-resolution. Speciﬁcally, we show that deformable convolution can be decomposed into a combination of spatial warping and convolution, revealing the commonality of deformable alignment and ﬂow-based alignment in formulation, but with a key diﬀerence in their oﬀset diversity. Based on our observations, we propose an oﬀset-ﬁdelity loss that guides the oﬀset learning with optical ﬂow. Experiments show that our loss successfully avoids the overﬂow of oﬀsets and alleviates the instability problem of deformable alignment. Third, we reconsider some most essential components for video super-resolution guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms. We conduct a systematic analysis to explain how such gain can be obtained and discuss the pitfalls. We further show the extensibility of BasicVSR by presenting IconVSR and BasicVSR++. IconVSR contains an information-reﬁll mechanism to alleviate the error accumulation problem, and a coupled propagation to faciiliate information ﬂow during propagation. BasicVSR++ further enhances propagation and alignment with second-order grid propagation and ﬂow-guided deformable alignment. Our BasicVSR series signiﬁcantly outperforms existing works in both eﬃciency and output quality. Fourth, we provide solutions to tackle the unique challenges in real-world video super-resolution in inference and training, induced by the diversity and complexity of degradations. First, we introduce an image pre-cleaning stage to reduce noises and artifacts prior to propagation, substantially improving the output quality. Second, we provide analysis and solutions to the problems resulting from the increased computational burden in the task. In addition, to facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences containing rich textures and patterns. Our dataset can serve as a common ground for benchmarking.
author2	Chen Change Loy
author_facet	Chen Change Loy Chan, Kelvin Cheuk Kit
format	Thesis-Doctor of Philosophy
author	Chan, Kelvin Cheuk Kit
author_sort	Chan, Kelvin Cheuk Kit
title	Image and video super-resolution in the wild
title_short	Image and video super-resolution in the wild
title_full	Image and video super-resolution in the wild
title_fullStr	Image and video super-resolution in the wild
title_full_unstemmed	Image and video super-resolution in the wild
title_sort	image and video super-resolution in the wild
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/160140
_version_	1743119565881081856
spelling	sg-ntu-dr.10356-1601402022-08-01T05:07:18Z Image and video super-resolution in the wild Chan, Kelvin Cheuk Kit Chen Change Loy School of Computer Science and Engineering ccloy@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision With the increasing need for high-resolution content, there is a need to develop super-resolution techniques that improve the resolution of images and videos captured from non-professional imaging devices. Researchers have made incessant eﬀorts to improve the resolution of images and videos to meliorate user experience and enhance performance in downstream tasks. However, most existing approaches focus on designing an image-to-image mapping, failing in employing auxiliary information readily available in reality. As a result, such methods often possess suboptimal eﬀectiveness and eﬃciency owing to inadequate information aggregation and large network complexity. In addition, it remains nontrivial to generalize to uncontrolled scenes, whose degradations could be complex, diverse, and unknown. This thesis proposes solutions for eﬀective image and video super-resolution and generalization to real-world degradations through exploiting generative priors and temporal information. The thesis ﬁrst demonstrates that pre-trained Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR). Our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN. GLEAN can be easily incorporated in a simple encoder-bank-decoder architecture with multi-resolution skip connections. Images upscaled by GLEAN show clear improvements in terms of ﬁdelity and texture faithfulness compared to existing methods. Second, we study the underlying mechanism of deformable alignment, which shows compelling performance in aligning multiple frames for video super-resolution. Speciﬁcally, we show that deformable convolution can be decomposed into a combination of spatial warping and convolution, revealing the commonality of deformable alignment and ﬂow-based alignment in formulation, but with a key diﬀerence in their oﬀset diversity. Based on our observations, we propose an oﬀset-ﬁdelity loss that guides the oﬀset learning with optical ﬂow. Experiments show that our loss successfully avoids the overﬂow of oﬀsets and alleviates the instability problem of deformable alignment. Third, we reconsider some most essential components for video super-resolution guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms. We conduct a systematic analysis to explain how such gain can be obtained and discuss the pitfalls. We further show the extensibility of BasicVSR by presenting IconVSR and BasicVSR++. IconVSR contains an information-reﬁll mechanism to alleviate the error accumulation problem, and a coupled propagation to faciiliate information ﬂow during propagation. BasicVSR++ further enhances propagation and alignment with second-order grid propagation and ﬂow-guided deformable alignment. Our BasicVSR series signiﬁcantly outperforms existing works in both eﬃciency and output quality. Fourth, we provide solutions to tackle the unique challenges in real-world video super-resolution in inference and training, induced by the diversity and complexity of degradations. First, we introduce an image pre-cleaning stage to reduce noises and artifacts prior to propagation, substantially improving the output quality. Second, we provide analysis and solutions to the problems resulting from the increased computational burden in the task. In addition, to facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences containing rich textures and patterns. Our dataset can serve as a common ground for benchmarking. Doctor of Philosophy 2022-07-14T00:54:06Z 2022-07-14T00:54:06Z 2022 Thesis-Doctor of Philosophy Chan, K. C. K. (2022). Image and video super-resolution in the wild. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/160140 https://hdl.handle.net/10356/160140 10.32657/10356/160140 en I1901E0052 2018-T1-002-056 NTU SUG grant NTU NAP This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Image and video super-resolution in the wild

Similar Items