MVGamba : Unify 3D content generation as state space sequence modeling

Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering eff...

Full description

Saved in:
Bibliographic Details
Main Authors: YI, Xuanyu, WU, Zike, SHEN, Qiuhong, XU, Qingshan, ZHOU, Pan, LIM, Joo-Hwee, YAN, Shuicheng, WANG, Xinchao, ZHANG, Hanwang
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9491
https://ink.library.smu.edu.sg/context/sis_research/article/10491/viewcontent/MVGamba.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10491
record_format dspace
spelling sg-smu-ink.sis_research-104912024-11-11T06:05:56Z MVGamba : Unify 3D content generation as state space sequence modeling YI, Xuanyu WU, Zike SHEN, Qiuhong XU, Qingshan ZHOU, Pan LIM, Joo-Hwee YAN, Shuicheng WANG, Xinchao ZHANG, Hanwang Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (e.g., Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity. With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only 0.1× of the model size. 2024-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9491 info:doi/doi.org/10.48550/arXiv.2406.06367 https://ink.library.smu.edu.sg/context/sis_research/article/10491/viewcontent/MVGamba.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Large Reconstruction Models LRMs Gaussian reconstruction model 3D content generation Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Large Reconstruction Models
LRMs
Gaussian reconstruction model
3D content generation
Artificial Intelligence and Robotics
Graphics and Human Computer Interfaces
spellingShingle Large Reconstruction Models
LRMs
Gaussian reconstruction model
3D content generation
Artificial Intelligence and Robotics
Graphics and Human Computer Interfaces
YI, Xuanyu
WU, Zike
SHEN, Qiuhong
XU, Qingshan
ZHOU, Pan
LIM, Joo-Hwee
YAN, Shuicheng
WANG, Xinchao
ZHANG, Hanwang
MVGamba : Unify 3D content generation as state space sequence modeling
description Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (e.g., Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity. With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only 0.1× of the model size.
format text
author YI, Xuanyu
WU, Zike
SHEN, Qiuhong
XU, Qingshan
ZHOU, Pan
LIM, Joo-Hwee
YAN, Shuicheng
WANG, Xinchao
ZHANG, Hanwang
author_facet YI, Xuanyu
WU, Zike
SHEN, Qiuhong
XU, Qingshan
ZHOU, Pan
LIM, Joo-Hwee
YAN, Shuicheng
WANG, Xinchao
ZHANG, Hanwang
author_sort YI, Xuanyu
title MVGamba : Unify 3D content generation as state space sequence modeling
title_short MVGamba : Unify 3D content generation as state space sequence modeling
title_full MVGamba : Unify 3D content generation as state space sequence modeling
title_fullStr MVGamba : Unify 3D content generation as state space sequence modeling
title_full_unstemmed MVGamba : Unify 3D content generation as state space sequence modeling
title_sort mvgamba : unify 3d content generation as state space sequence modeling
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9491
https://ink.library.smu.edu.sg/context/sis_research/article/10491/viewcontent/MVGamba.pdf
_version_ 1816859093095153664