Scene coordinate encoding for 3D representation and relocalization

Visual localization, estimating the camera’s position and orientation within a known scene, is crucial for robotics, autonomous driving, and AR/VR applications. Scene Coordinate Regression (SCR) is a structure-based visual localization method that implicitly encodes the map information within the we...

全面介紹

Saved in:
書目詳細資料
主要作者: Jiang, Zeyu
其他作者: Xie Lihua
格式: Thesis-Master by Coursework
語言:English
出版: Nanyang Technological University 2025
主題:
在線閱讀:https://hdl.handle.net/10356/182472
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Visual localization, estimating the camera’s position and orientation within a known scene, is crucial for robotics, autonomous driving, and AR/VR applications. Scene Coordinate Regression (SCR) is a structure-based visual localization method that implicitly encodes the map information within the weights of a deep neural network and directly regresses the 2D-3D matches. However, in the current study, the limited learning capacity of deep neural networks renders SCR methods inadequate for representing repetitive textures and meaningless areas due to the reliance on implicit triangulation in 3D representation. We investigate the problems of SCR regarding scene feature representation and relocalization accuracy. We improve scene coordinate encoding and inter-frame localization optimization compared to previous research methods. We design a network architecture capable of simultaneously encoding 3D scenes and extracting salient keypoints. Moreover, we introduce a mechanism that leverages sequential information during map encoding and relocalization to strengthen implicit triangulation, particularly in repetitive texture environments. Comparative experiments are conducted on indoor and outdoor datasets with current state-of-the-art (SOTA) visual relocalization methods. Our single-frame and sequence-based relocalization modes outperform other SOTA methods regarding frame rate and visual localization accuracy.