Scene coordinate encoding for 3D representation and relocalization
Visual localization, estimating the camera’s position and orientation within a known scene, is crucial for robotics, autonomous driving, and AR/VR applications. Scene Coordinate Regression (SCR) is a structure-based visual localization method that implicitly encodes the map information within the we...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182472 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Visual localization, estimating the camera’s position and orientation within a known scene, is crucial for robotics, autonomous driving, and AR/VR applications. Scene Coordinate Regression (SCR) is a structure-based visual localization method that implicitly encodes the map information within the weights of a deep neural network and directly regresses the 2D-3D matches. However, in the current study, the limited learning capacity of deep neural networks renders SCR methods inadequate for representing repetitive textures and meaningless areas due to the reliance on implicit triangulation in 3D representation.
We investigate the problems of SCR regarding scene feature representation and relocalization accuracy. We improve scene coordinate encoding and inter-frame localization optimization compared to previous research methods. We design a network architecture capable of simultaneously encoding 3D scenes and extracting salient keypoints. Moreover, we introduce a mechanism that leverages sequential information during map encoding and relocalization to strengthen implicit triangulation, particularly in repetitive texture environments.
Comparative experiments are conducted on indoor and outdoor datasets with current state-of-the-art (SOTA) visual relocalization methods. Our single-frame and sequence-based relocalization modes outperform other SOTA methods regarding frame rate and visual localization accuracy. |
---|