Exploiting diffusion prior for real-world image super-resolution

We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby pre...

Full description

Saved in:
Bibliographic Details
Main Authors: Wang, Jianyi, Yue, Zongsheng, Zhou, Shangchen, Chan, Kelvin C. K., Loy, Chen Change
Other Authors: College of Computing and Data Science
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/180685
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-180685
record_format dspace
spelling sg-ntu-dr.10356-1806852024-10-21T02:09:42Z Exploiting diffusion prior for real-world image super-resolution Wang, Jianyi Yue, Zongsheng Zhou, Shangchen Chan, Kelvin C. K. Loy, Chen Change College of Computing and Data Science S-Lab Computer and Information Science Image restoration Diffusion models We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we employ a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches. Code and models are available at https://github.com/IceClear/StableSR. National Research Foundation (NRF) This study is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG2-PhD-2022-01-033[T]), RIE2020 Industry Alignment Fund Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). We sincerely thank Yi Li for providing valuable advice and building the WebUI implementation (https://github.com/pkuliyi2015/ sd-webui-stablesr) of our work. We also thank the continuous interest and contributions from the community. 2024-10-21T02:09:42Z 2024-10-21T02:09:42Z 2024 Journal Article Wang, J., Yue, Z., Zhou, S., Chan, K. C. K. & Loy, C. C. (2024). Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision. https://dx.doi.org/10.1007/s11263-024-02168-7 0920-5691 https://hdl.handle.net/10356/180685 10.1007/s11263-024-02168-7 2-s2.0-85198058630 en RIE2020 AISG2-PhD-2022-01-033[T] International Journal of Computer Vision © 2024 The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Image restoration
Diffusion models
spellingShingle Computer and Information Science
Image restoration
Diffusion models
Wang, Jianyi
Yue, Zongsheng
Zhou, Shangchen
Chan, Kelvin C. K.
Loy, Chen Change
Exploiting diffusion prior for real-world image super-resolution
description We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we employ a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches. Code and models are available at https://github.com/IceClear/StableSR.
author2 College of Computing and Data Science
author_facet College of Computing and Data Science
Wang, Jianyi
Yue, Zongsheng
Zhou, Shangchen
Chan, Kelvin C. K.
Loy, Chen Change
format Article
author Wang, Jianyi
Yue, Zongsheng
Zhou, Shangchen
Chan, Kelvin C. K.
Loy, Chen Change
author_sort Wang, Jianyi
title Exploiting diffusion prior for real-world image super-resolution
title_short Exploiting diffusion prior for real-world image super-resolution
title_full Exploiting diffusion prior for real-world image super-resolution
title_fullStr Exploiting diffusion prior for real-world image super-resolution
title_full_unstemmed Exploiting diffusion prior for real-world image super-resolution
title_sort exploiting diffusion prior for real-world image super-resolution
publishDate 2024
url https://hdl.handle.net/10356/180685
_version_ 1814777811475365888