TF-ICON: diffusion-based training-free cross-domain image composition

Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided compositio...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lu, Shilin, Liu, Yanzhu, Kong, Adams Wai Kin
Other Authors:	School of Computer Science and Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Computer Vision Generative Model Diffusion Model Text-to-Image Model Image Composition
Online Access:	https://hdl.handle.net/10356/172261 https://iccv2023.thecvf.com/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-172261
record_format	dspace
spelling	sg-ntu-dr.10356-1722612024-01-24T02:02:48Z TF-ICON: diffusion-based training-free cross-domain image composition Lu, Shilin Liu, Yanzhu Kong, Adams Wai Kin School of Computer Science and Engineering 2023 IEEE/CVF International Conference on Computer Vision (ICCV) Institute for Infocomm Research, ASTAR Centre for Frontier AI Research, ASTAR Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Computer Vision Generative Model Diffusion Model Text-to-Image Model Image Composition Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains. Submitted/Accepted version This research is supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. 2023-12-06T01:08:37Z 2023-12-06T01:08:37Z 2023 Conference Paper Lu, S., Liu, Y. & Kong, A. W. K. (2023). TF-ICON: diffusion-based training-free cross-domain image composition. 2023 IEEE/CVF International Conference on Computer Vision (ICCV). https://dx.doi.org/10.1109/ICCV51070.2023.00218 https://hdl.handle.net/10356/172261 10.1109/ICCV51070.2023.00218 https://iccv2023.thecvf.com/ en © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/ICCV51070.2023.00218. application/pdf application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Computer Vision Generative Model Diffusion Model Text-to-Image Model Image Composition
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Computer Vision Generative Model Diffusion Model Text-to-Image Model Image Composition Lu, Shilin Liu, Yanzhu Kong, Adams Wai Kin TF-ICON: diffusion-based training-free cross-domain image composition
description	Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Lu, Shilin Liu, Yanzhu Kong, Adams Wai Kin
format	Conference or Workshop Item
author	Lu, Shilin Liu, Yanzhu Kong, Adams Wai Kin
author_sort	Lu, Shilin
title	TF-ICON: diffusion-based training-free cross-domain image composition
title_short	TF-ICON: diffusion-based training-free cross-domain image composition
title_full	TF-ICON: diffusion-based training-free cross-domain image composition
title_fullStr	TF-ICON: diffusion-based training-free cross-domain image composition
title_full_unstemmed	TF-ICON: diffusion-based training-free cross-domain image composition
title_sort	tf-icon: diffusion-based training-free cross-domain image composition
publishDate	2023
url	https://hdl.handle.net/10356/172261 https://iccv2023.thecvf.com/
_version_	1789483105268006912

TF-ICON: diffusion-based training-free cross-domain image composition

Similar Items