Scene coordinate encoding for 3D representation and relocalization

Visual localization, estimating the camera’s position and orientation within a known scene, is crucial for robotics, autonomous driving, and AR/VR applications. Scene Coordinate Regression (SCR) is a structure-based visual localization method that implicitly encodes the map information within the we...

Full description

Saved in:

Bibliographic Details
Main Author:	Jiang, Zeyu
Other Authors:	Xie Lihua
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2025
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/182472
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	Visual localization, estimating the camera’s position and orientation within a known scene, is crucial for robotics, autonomous driving, and AR/VR applications. Scene Coordinate Regression (SCR) is a structure-based visual localization method that implicitly encodes the map information within the weights of a deep neural network and directly regresses the 2D-3D matches. However, in the current study, the limited learning capacity of deep neural networks renders SCR methods inadequate for representing repetitive textures and meaningless areas due to the reliance on implicit triangulation in 3D representation. We investigate the problems of SCR regarding scene feature representation and relocalization accuracy. We improve scene coordinate encoding and inter-frame localization optimization compared to previous research methods. We design a network architecture capable of simultaneously encoding 3D scenes and extracting salient keypoints. Moreover, we introduce a mechanism that leverages sequential information during map encoding and relocalization to strengthen implicit triangulation, particularly in repetitive texture environments. Comparative experiments are conducted on indoor and outdoor datasets with current state-of-the-art (SOTA) visual relocalization methods. Our single-frame and sequence-based relocalization modes outperform other SOTA methods regarding frame rate and visual localization accuracy.

Scene coordinate encoding for 3D representation and relocalization

Similar Items