TF-ICON: diffusion-based training-free cross-domain image composition

Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided compositio...

Full description

Saved in:
Bibliographic Details
Main Authors: Lu, Shilin, Liu, Yanzhu, Kong, Adams Wai Kin
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172261
https://iccv2023.thecvf.com/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-172261
record_format dspace
spelling sg-ntu-dr.10356-1722612024-01-24T02:02:48Z TF-ICON: diffusion-based training-free cross-domain image composition Lu, Shilin Liu, Yanzhu Kong, Adams Wai Kin School of Computer Science and Engineering 2023 IEEE/CVF International Conference on Computer Vision (ICCV) Institute for Infocomm Research, A*STAR Centre for Frontier AI Research, A*STAR Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Computer Vision Generative Model Diffusion Model Text-to-Image Model Image Composition Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains. Submitted/Accepted version This research is supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. 2023-12-06T01:08:37Z 2023-12-06T01:08:37Z 2023 Conference Paper Lu, S., Liu, Y. & Kong, A. W. K. (2023). TF-ICON: diffusion-based training-free cross-domain image composition. 2023 IEEE/CVF International Conference on Computer Vision (ICCV). https://dx.doi.org/10.1109/ICCV51070.2023.00218 https://hdl.handle.net/10356/172261 10.1109/ICCV51070.2023.00218 https://iccv2023.thecvf.com/ en © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/ICCV51070.2023.00218. application/pdf application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Computer Vision
Generative Model
Diffusion Model
Text-to-Image Model
Image Composition
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Computer Vision
Generative Model
Diffusion Model
Text-to-Image Model
Image Composition
Lu, Shilin
Liu, Yanzhu
Kong, Adams Wai Kin
TF-ICON: diffusion-based training-free cross-domain image composition
description Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Lu, Shilin
Liu, Yanzhu
Kong, Adams Wai Kin
format Conference or Workshop Item
author Lu, Shilin
Liu, Yanzhu
Kong, Adams Wai Kin
author_sort Lu, Shilin
title TF-ICON: diffusion-based training-free cross-domain image composition
title_short TF-ICON: diffusion-based training-free cross-domain image composition
title_full TF-ICON: diffusion-based training-free cross-domain image composition
title_fullStr TF-ICON: diffusion-based training-free cross-domain image composition
title_full_unstemmed TF-ICON: diffusion-based training-free cross-domain image composition
title_sort tf-icon: diffusion-based training-free cross-domain image composition
publishDate 2023
url https://hdl.handle.net/10356/172261
https://iccv2023.thecvf.com/
_version_ 1789483105268006912