Methods for large-scale image-based localization using structure-from-motion point clouds

Image-based localization, i.e. estimating the camera pose of an image, is a fundamental task in many 3D computer vision applications. For instance, visual navigation for self-driving cars and robots, mixed reality and augmented reality all rely on this essential task. Due to easy availability and ri...

Full description

Saved in:
Bibliographic Details
Main Author: Cheng, Wentao
Other Authors: Lin Weisi
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137803
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-137803
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Cheng, Wentao
Methods for large-scale image-based localization using structure-from-motion point clouds
description Image-based localization, i.e. estimating the camera pose of an image, is a fundamental task in many 3D computer vision applications. For instance, visual navigation for self-driving cars and robots, mixed reality and augmented reality all rely on this essential task. Due to easy availability and richness of information, large-scale 3D point clouds reconstructed from images via Structure-from-Motion (SfM) techniques have received broad attention in the area of image-based localization. Therein, the 6-DOF camera pose can be computed from 2D-3D matches established between a query image and an SfM point cloud. During the last decade, to handle large-scale SfM point clouds, many image-based localization methods have been proposed, in which significant improvements have been achieved in many aspects. Yet, it remains difficult but meaningful to build a system, which (i) robustly handles the prohibitively expensive memory consumption brought by large-scale SfM point clouds, (ii) well resolves the match disambiguation problem, i.e. distinguishing correct matches from wrong ones, which is even more challenging in urban scenes or under binary feature representation and (iii) achieves high localization accuracy so that the system can be safely applied in low false tolerance applications such as autonomous driving. In this thesis, we propose three methods that tackle these challenging problems to make a further step towards such an ultimate system. First of all, we aim to solve the memory consumption problem by means of simplifying a large-scale SfM point cloud to a small but highly informative subset. To this end, we propose a data-driven SfM point cloud simplification framework, which allows us to automatically predict a suitable parameter setting by exploiting the intrinsic visibility information. In addition, we introduce a weight function into the standard greedy SfM point cloud simplification algorithm, so that more essential 3D points can be well preserved. We experimentally evaluate the proposed framework on real-world large-scale datasets, and show the robustness of parameter prediction. The simplified SfM point clouds generated by our framework achieve better localization performance, which demonstrates the benefit of our framework for image-based localization in devices with limited memory resources. Second, we investigate the match disambiguation problem in large-scale SfM point clouds depicting urban environments. Due to feature space density and massive repetitive structures, this problem becomes challenging if solely depending on feature appearances. As such, we present a two-stage outlier filtering framework that leverages both the visibility and geometry information of SfM point clouds. We first propose a visibility-based outlier filter, which is based on the bipartite visibility graph, to filter outliers on a coarse level. By deriving a data-driven geometrical constraint for urban environments, we present a geometry-based outlier filter to generate a set of fine-grained matches. The proposed framework only relies on the intrinsic information of an SfM point cloud. It is thus widely applicable to be embedded into existing image-based localization approaches. Our framework is able to handle matches of very large outlier ratio and outperforms state-of-the-art image-based localization methods in terms of effectiveness. Last, we aim to build a general-purpose image-based localization system that simultaneously solves the memory consumption, match disambiguation and localization accuracy problems. We adopt a binary feature representation and propose a corresponding match disambiguation method by adequately utilizing the intrinsic feature, visibility and geometry information. The core idea is that we divide the challenging disambiguation task into two different tasks before deriving an auxiliary camera pose for final disambiguation. One task focuses on preserving potentially correct matches, while another focuses on obtaining high quality matches to facilitate subsequent more powerful disambiguation. Moreover, our system improves the localization accuracy by introducing a quality-aware spatial reconfiguration method and a principal focal length enhanced pose estimation method. Our experimental study confirms that the proposed system achieves superior localization accuracy using significantly smaller memory resources comparing with state-of-the-art methods.
author2 Lin Weisi
author_facet Lin Weisi
Cheng, Wentao
format Thesis-Doctor of Philosophy
author Cheng, Wentao
author_sort Cheng, Wentao
title Methods for large-scale image-based localization using structure-from-motion point clouds
title_short Methods for large-scale image-based localization using structure-from-motion point clouds
title_full Methods for large-scale image-based localization using structure-from-motion point clouds
title_fullStr Methods for large-scale image-based localization using structure-from-motion point clouds
title_full_unstemmed Methods for large-scale image-based localization using structure-from-motion point clouds
title_sort methods for large-scale image-based localization using structure-from-motion point clouds
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/137803
_version_ 1683493463210852352
spelling sg-ntu-dr.10356-1378032020-10-28T08:41:05Z Methods for large-scale image-based localization using structure-from-motion point clouds Cheng, Wentao Lin Weisi School of Computer Science and Engineering wslin@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Image-based localization, i.e. estimating the camera pose of an image, is a fundamental task in many 3D computer vision applications. For instance, visual navigation for self-driving cars and robots, mixed reality and augmented reality all rely on this essential task. Due to easy availability and richness of information, large-scale 3D point clouds reconstructed from images via Structure-from-Motion (SfM) techniques have received broad attention in the area of image-based localization. Therein, the 6-DOF camera pose can be computed from 2D-3D matches established between a query image and an SfM point cloud. During the last decade, to handle large-scale SfM point clouds, many image-based localization methods have been proposed, in which significant improvements have been achieved in many aspects. Yet, it remains difficult but meaningful to build a system, which (i) robustly handles the prohibitively expensive memory consumption brought by large-scale SfM point clouds, (ii) well resolves the match disambiguation problem, i.e. distinguishing correct matches from wrong ones, which is even more challenging in urban scenes or under binary feature representation and (iii) achieves high localization accuracy so that the system can be safely applied in low false tolerance applications such as autonomous driving. In this thesis, we propose three methods that tackle these challenging problems to make a further step towards such an ultimate system. First of all, we aim to solve the memory consumption problem by means of simplifying a large-scale SfM point cloud to a small but highly informative subset. To this end, we propose a data-driven SfM point cloud simplification framework, which allows us to automatically predict a suitable parameter setting by exploiting the intrinsic visibility information. In addition, we introduce a weight function into the standard greedy SfM point cloud simplification algorithm, so that more essential 3D points can be well preserved. We experimentally evaluate the proposed framework on real-world large-scale datasets, and show the robustness of parameter prediction. The simplified SfM point clouds generated by our framework achieve better localization performance, which demonstrates the benefit of our framework for image-based localization in devices with limited memory resources. Second, we investigate the match disambiguation problem in large-scale SfM point clouds depicting urban environments. Due to feature space density and massive repetitive structures, this problem becomes challenging if solely depending on feature appearances. As such, we present a two-stage outlier filtering framework that leverages both the visibility and geometry information of SfM point clouds. We first propose a visibility-based outlier filter, which is based on the bipartite visibility graph, to filter outliers on a coarse level. By deriving a data-driven geometrical constraint for urban environments, we present a geometry-based outlier filter to generate a set of fine-grained matches. The proposed framework only relies on the intrinsic information of an SfM point cloud. It is thus widely applicable to be embedded into existing image-based localization approaches. Our framework is able to handle matches of very large outlier ratio and outperforms state-of-the-art image-based localization methods in terms of effectiveness. Last, we aim to build a general-purpose image-based localization system that simultaneously solves the memory consumption, match disambiguation and localization accuracy problems. We adopt a binary feature representation and propose a corresponding match disambiguation method by adequately utilizing the intrinsic feature, visibility and geometry information. The core idea is that we divide the challenging disambiguation task into two different tasks before deriving an auxiliary camera pose for final disambiguation. One task focuses on preserving potentially correct matches, while another focuses on obtaining high quality matches to facilitate subsequent more powerful disambiguation. Moreover, our system improves the localization accuracy by introducing a quality-aware spatial reconfiguration method and a principal focal length enhanced pose estimation method. Our experimental study confirms that the proposed system achieves superior localization accuracy using significantly smaller memory resources comparing with state-of-the-art methods. Doctor of Philosophy 2020-04-15T03:19:37Z 2020-04-15T03:19:37Z 2020 Thesis-Doctor of Philosophy Cheng, W. (2020). Methods for large-scale image-based localization using structure-from-motion point clouds. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/137803 10.32657/10356/137803 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University