MVGamba : Unify 3D content generation as state space sequence modeling
Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering eff...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9491 https://ink.library.smu.edu.sg/context/sis_research/article/10491/viewcontent/MVGamba.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-10491 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-104912024-11-11T06:05:56Z MVGamba : Unify 3D content generation as state space sequence modeling YI, Xuanyu WU, Zike SHEN, Qiuhong XU, Qingshan ZHOU, Pan LIM, Joo-Hwee YAN, Shuicheng WANG, Xinchao ZHANG, Hanwang Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (e.g., Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity. With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only 0.1× of the model size. 2024-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9491 info:doi/doi.org/10.48550/arXiv.2406.06367 https://ink.library.smu.edu.sg/context/sis_research/article/10491/viewcontent/MVGamba.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Large Reconstruction Models LRMs Gaussian reconstruction model 3D content generation Artificial Intelligence and Robotics Graphics and Human Computer Interfaces |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Large Reconstruction Models LRMs Gaussian reconstruction model 3D content generation Artificial Intelligence and Robotics Graphics and Human Computer Interfaces |
spellingShingle |
Large Reconstruction Models LRMs Gaussian reconstruction model 3D content generation Artificial Intelligence and Robotics Graphics and Human Computer Interfaces YI, Xuanyu WU, Zike SHEN, Qiuhong XU, Qingshan ZHOU, Pan LIM, Joo-Hwee YAN, Shuicheng WANG, Xinchao ZHANG, Hanwang MVGamba : Unify 3D content generation as state space sequence modeling |
description |
Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (e.g., Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity. With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only 0.1× of the model size. |
format |
text |
author |
YI, Xuanyu WU, Zike SHEN, Qiuhong XU, Qingshan ZHOU, Pan LIM, Joo-Hwee YAN, Shuicheng WANG, Xinchao ZHANG, Hanwang |
author_facet |
YI, Xuanyu WU, Zike SHEN, Qiuhong XU, Qingshan ZHOU, Pan LIM, Joo-Hwee YAN, Shuicheng WANG, Xinchao ZHANG, Hanwang |
author_sort |
YI, Xuanyu |
title |
MVGamba : Unify 3D content generation as state space sequence modeling |
title_short |
MVGamba : Unify 3D content generation as state space sequence modeling |
title_full |
MVGamba : Unify 3D content generation as state space sequence modeling |
title_fullStr |
MVGamba : Unify 3D content generation as state space sequence modeling |
title_full_unstemmed |
MVGamba : Unify 3D content generation as state space sequence modeling |
title_sort |
mvgamba : unify 3d content generation as state space sequence modeling |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2024 |
url |
https://ink.library.smu.edu.sg/sis_research/9491 https://ink.library.smu.edu.sg/context/sis_research/article/10491/viewcontent/MVGamba.pdf |
_version_ |
1816859093095153664 |