SceneDreamer: unbounded 3D scene generation from 2D image collections

In this work, we present SceneDreamer, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principle...

Full description

Saved in:
Bibliographic Details
Main Authors: Chen, Zhaoxi, Wang, Guangcong, Liu, Ziwei
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173443
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-173443
record_format dspace
spelling sg-ntu-dr.10356-1734432024-02-06T07:04:26Z SceneDreamer: unbounded 3D scene generation from 2D image collections Chen, Zhaoxi Wang, Guangcong Liu, Ziwei School of Computer Science and Engineering S-Lab Computer and Information Science Neural Rendering Unbounded Scene Generation In this work, we present SceneDreamer, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising: 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables: 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds. Ministry of Education (MOE) Nanyang Technological University National Research Foundation (NRF) This work was supported in part by National Research Foundation, Singapore under its AI Singapore Programme (AISG) under Grant AISG2-PhD-2021-08-019, in part by NTU NAP, in part by MOE AcRF Tier 2 under Grant T2EP20221-0012, and in part by RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). 2024-02-05T01:13:20Z 2024-02-05T01:13:20Z 2023 Journal Article Chen, Z., Wang, G. & Liu, Z. (2023). SceneDreamer: unbounded 3D scene generation from 2D image collections. IEEE Transactions On Pattern Analysis and Machine Intelligence, 45(12), 15562-15576. https://dx.doi.org/10.1109/TPAMI.2023.3321857 0162-8828 https://hdl.handle.net/10356/173443 10.1109/TPAMI.2023.3321857 37788193 2-s2.0-85174809772 12 45 15562 15576 en AISG2-PhD-2021-08-019 NTU NAP T2EP20221-0012 IEEE Transactions on Pattern Analysis and Machine Intelligence © 2023 IEEE. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Neural Rendering
Unbounded Scene Generation
spellingShingle Computer and Information Science
Neural Rendering
Unbounded Scene Generation
Chen, Zhaoxi
Wang, Guangcong
Liu, Ziwei
SceneDreamer: unbounded 3D scene generation from 2D image collections
description In this work, we present SceneDreamer, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising: 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables: 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Chen, Zhaoxi
Wang, Guangcong
Liu, Ziwei
format Article
author Chen, Zhaoxi
Wang, Guangcong
Liu, Ziwei
author_sort Chen, Zhaoxi
title SceneDreamer: unbounded 3D scene generation from 2D image collections
title_short SceneDreamer: unbounded 3D scene generation from 2D image collections
title_full SceneDreamer: unbounded 3D scene generation from 2D image collections
title_fullStr SceneDreamer: unbounded 3D scene generation from 2D image collections
title_full_unstemmed SceneDreamer: unbounded 3D scene generation from 2D image collections
title_sort scenedreamer: unbounded 3d scene generation from 2d image collections
publishDate 2024
url https://hdl.handle.net/10356/173443
_version_ 1794549355130126336