TF-ICON: diffusion-based training-free cross-domain image composition
Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided compositio...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172261 https://iccv2023.thecvf.com/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-172261 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1722612024-01-24T02:02:48Z TF-ICON: diffusion-based training-free cross-domain image composition Lu, Shilin Liu, Yanzhu Kong, Adams Wai Kin School of Computer Science and Engineering 2023 IEEE/CVF International Conference on Computer Vision (ICCV) Institute for Infocomm Research, A*STAR Centre for Frontier AI Research, A*STAR Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Computer Vision Generative Model Diffusion Model Text-to-Image Model Image Composition Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains. Submitted/Accepted version This research is supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. 2023-12-06T01:08:37Z 2023-12-06T01:08:37Z 2023 Conference Paper Lu, S., Liu, Y. & Kong, A. W. K. (2023). TF-ICON: diffusion-based training-free cross-domain image composition. 2023 IEEE/CVF International Conference on Computer Vision (ICCV). https://dx.doi.org/10.1109/ICCV51070.2023.00218 https://hdl.handle.net/10356/172261 10.1109/ICCV51070.2023.00218 https://iccv2023.thecvf.com/ en © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/ICCV51070.2023.00218. application/pdf application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Computer Vision Generative Model Diffusion Model Text-to-Image Model Image Composition |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Computer Vision Generative Model Diffusion Model Text-to-Image Model Image Composition Lu, Shilin Liu, Yanzhu Kong, Adams Wai Kin TF-ICON: diffusion-based training-free cross-domain image composition |
description |
Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Lu, Shilin Liu, Yanzhu Kong, Adams Wai Kin |
format |
Conference or Workshop Item |
author |
Lu, Shilin Liu, Yanzhu Kong, Adams Wai Kin |
author_sort |
Lu, Shilin |
title |
TF-ICON: diffusion-based training-free cross-domain image composition |
title_short |
TF-ICON: diffusion-based training-free cross-domain image composition |
title_full |
TF-ICON: diffusion-based training-free cross-domain image composition |
title_fullStr |
TF-ICON: diffusion-based training-free cross-domain image composition |
title_full_unstemmed |
TF-ICON: diffusion-based training-free cross-domain image composition |
title_sort |
tf-icon: diffusion-based training-free cross-domain image composition |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/172261 https://iccv2023.thecvf.com/ |
_version_ |
1789483105268006912 |