Instant3D: Instant Text-to-3D Generation

Text-to-3D generation has attracted much attention from the computer vision community. Existing methods mainly optimize a neural field from scratch for each text prompt, relying on heavy and repetitive training cost which impedes their practical deployment. In this paper, we propose a novel framewor...

Full description

Saved in:

Bibliographic Details
Main Authors:	LI, Ming, ZHOU, Pan, LIU, Jia-Wei, KEPPO, Jussi, LIN, Min, YAN, Shuicheng, XU, Xiangyu
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Large-scale generative models Neural radiance fields Text-to-3D generation Graphics and Human Computer Interfaces Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/8816 https://ink.library.smu.edu.sg/context/sis_research/article/9819/viewcontent/Instant3D_av.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9819
record_format	dspace
spelling	sg-smu-ink.sis_research-98192024-05-30T07:26:20Z Instant3D: Instant Text-to-3D Generation LI, Ming ZHOU, Pan LIU, Jia-Wei KEPPO, Jussi LIN, Min YAN, Shuicheng XU, Xiangyu Text-to-3D generation has attracted much attention from the computer vision community. Existing methods mainly optimize a neural field from scratch for each text prompt, relying on heavy and repetitive training cost which impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. In particular, we propose to combine three key mechanisms: cross-attention, style injection, and token-to-plane transformation, which collectively ensure precise alignment of the output with the input text. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The code, data, and models are available at https://ming1993li.github.io/Instant3DProj/. 2024-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8816 info:doi/10.1007/s11263-024-02097-5 https://ink.library.smu.edu.sg/context/sis_research/article/9819/viewcontent/Instant3D_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Large-scale generative models Neural radiance fields Text-to-3D generation Graphics and Human Computer Interfaces Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Large-scale generative models Neural radiance fields Text-to-3D generation Graphics and Human Computer Interfaces Software Engineering
spellingShingle	Large-scale generative models Neural radiance fields Text-to-3D generation Graphics and Human Computer Interfaces Software Engineering LI, Ming ZHOU, Pan LIU, Jia-Wei KEPPO, Jussi LIN, Min YAN, Shuicheng XU, Xiangyu Instant3D: Instant Text-to-3D Generation
description	Text-to-3D generation has attracted much attention from the computer vision community. Existing methods mainly optimize a neural field from scratch for each text prompt, relying on heavy and repetitive training cost which impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. In particular, we propose to combine three key mechanisms: cross-attention, style injection, and token-to-plane transformation, which collectively ensure precise alignment of the output with the input text. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The code, data, and models are available at https://ming1993li.github.io/Instant3DProj/.
format	text
author	LI, Ming ZHOU, Pan LIU, Jia-Wei KEPPO, Jussi LIN, Min YAN, Shuicheng XU, Xiangyu
author_facet	LI, Ming ZHOU, Pan LIU, Jia-Wei KEPPO, Jussi LIN, Min YAN, Shuicheng XU, Xiangyu
author_sort	LI, Ming
title	Instant3D: Instant Text-to-3D Generation
title_short	Instant3D: Instant Text-to-3D Generation
title_full	Instant3D: Instant Text-to-3D Generation
title_fullStr	Instant3D: Instant Text-to-3D Generation
title_full_unstemmed	Instant3D: Instant Text-to-3D Generation
title_sort	instant3d: instant text-to-3d generation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/8816 https://ink.library.smu.edu.sg/context/sis_research/article/9819/viewcontent/Instant3D_av.pdf
_version_	1814047564997066752

Instant3D: Instant Text-to-3D Generation

Similar Items