Scene coordinate encoding for 3D representation and relocalization

Visual localization, estimating the camera’s position and orientation within a known scene, is crucial for robotics, autonomous driving, and AR/VR applications. Scene Coordinate Regression (SCR) is a structure-based visual localization method that implicitly encodes the map information within the we...

Full description

Saved in:
Bibliographic Details
Main Author: Jiang, Zeyu
Other Authors: Xie Lihua
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182472
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-182472
record_format dspace
spelling sg-ntu-dr.10356-1824722025-02-07T15:48:23Z Scene coordinate encoding for 3D representation and relocalization Jiang, Zeyu Xie Lihua School of Electrical and Electronic Engineering ELHXIE@ntu.edu.sg Computer and Information Science Visual localization, estimating the camera’s position and orientation within a known scene, is crucial for robotics, autonomous driving, and AR/VR applications. Scene Coordinate Regression (SCR) is a structure-based visual localization method that implicitly encodes the map information within the weights of a deep neural network and directly regresses the 2D-3D matches. However, in the current study, the limited learning capacity of deep neural networks renders SCR methods inadequate for representing repetitive textures and meaningless areas due to the reliance on implicit triangulation in 3D representation. We investigate the problems of SCR regarding scene feature representation and relocalization accuracy. We improve scene coordinate encoding and inter-frame localization optimization compared to previous research methods. We design a network architecture capable of simultaneously encoding 3D scenes and extracting salient keypoints. Moreover, we introduce a mechanism that leverages sequential information during map encoding and relocalization to strengthen implicit triangulation, particularly in repetitive texture environments. Comparative experiments are conducted on indoor and outdoor datasets with current state-of-the-art (SOTA) visual relocalization methods. Our single-frame and sequence-based relocalization modes outperform other SOTA methods regarding frame rate and visual localization accuracy. Master's degree 2025-02-04T08:03:39Z 2025-02-04T08:03:39Z 2025 Thesis-Master by Coursework Jiang, Z. (2025). Scene coordinate encoding for 3D representation and relocalization. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182472 https://hdl.handle.net/10356/182472 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Jiang, Zeyu
Scene coordinate encoding for 3D representation and relocalization
description Visual localization, estimating the camera’s position and orientation within a known scene, is crucial for robotics, autonomous driving, and AR/VR applications. Scene Coordinate Regression (SCR) is a structure-based visual localization method that implicitly encodes the map information within the weights of a deep neural network and directly regresses the 2D-3D matches. However, in the current study, the limited learning capacity of deep neural networks renders SCR methods inadequate for representing repetitive textures and meaningless areas due to the reliance on implicit triangulation in 3D representation. We investigate the problems of SCR regarding scene feature representation and relocalization accuracy. We improve scene coordinate encoding and inter-frame localization optimization compared to previous research methods. We design a network architecture capable of simultaneously encoding 3D scenes and extracting salient keypoints. Moreover, we introduce a mechanism that leverages sequential information during map encoding and relocalization to strengthen implicit triangulation, particularly in repetitive texture environments. Comparative experiments are conducted on indoor and outdoor datasets with current state-of-the-art (SOTA) visual relocalization methods. Our single-frame and sequence-based relocalization modes outperform other SOTA methods regarding frame rate and visual localization accuracy.
author2 Xie Lihua
author_facet Xie Lihua
Jiang, Zeyu
format Thesis-Master by Coursework
author Jiang, Zeyu
author_sort Jiang, Zeyu
title Scene coordinate encoding for 3D representation and relocalization
title_short Scene coordinate encoding for 3D representation and relocalization
title_full Scene coordinate encoding for 3D representation and relocalization
title_fullStr Scene coordinate encoding for 3D representation and relocalization
title_full_unstemmed Scene coordinate encoding for 3D representation and relocalization
title_sort scene coordinate encoding for 3d representation and relocalization
publisher Nanyang Technological University
publishDate 2025
url https://hdl.handle.net/10356/182472
_version_ 1823807362296709120