Visual metric and semantic localization for UGV
During the continual transitions from lab research to real-world applications of vision-based algorithms, there are significant challenges, e.g. the robustness to adapt to complex environments and the high demands of multi-task, multi-modal learning models. More specifically on visual simultaneous l...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/162442 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | During the continual transitions from lab research to real-world applications of vision-based algorithms, there are significant challenges, e.g. the robustness to adapt to complex environments and the high demands of multi-task, multi-modal learning models. More specifically on visual simultaneous localization and mapping (visual SLAM) for mobile robots, two notable limitations are: (i) the drift issue during pose estimation, especially in dynamic environments, makes the positioning system unstable; (ii) the additional loop detection takes many computational resources for image retrieval and global feature geometry checking. This thesis explores the approaches of visual localization on unmanned ground vehicles by solving those limitations.
The conventional feature extraction and matching pipeline for vision-based tasks involve three steps: (i) local feature detection and description, (ii) feature matching, and (iii) outlier rejection. The first contribution of the thesis is adding a feature selection and anticipation stage to reduce tracking drift. We explore the noise model of image features to select a subset of all the observed image features with the best ”contribution” during data association and pose estimation across multiple frames.
Conventional SLAM algorithms take a strong assumption of scene rigidity, which limits the application under challenging environments. The second part of the thesis addresses the tough issue in dynamic environments with moving objects. We presented GMC, namely the motion clustering approach, a lightweight dynamic object filtering method. It can distinguish moving objects from static landmarks. Based on the theory of motion coherence within a particular image area, GMC could segment dynamic objects in 3D space. We can provide an efficient and robust correspondence algorithm that can extract dynamic objects from a static background with the method. In this way, we propose a dynamic SLAM system that is real-time and free from expensive GPU processors.
In contrast to GMC, the thesis’s third part turns to an object-aware learning-based model for more general dynamic scenarios. We use object detection and tracking as points, lines, planes, etc. We utilize semantic information and extract sparse image features simultaneously to keep track of dynamic objects. The static background and different dynamic objects are jointly optimized in a newly developed bundle adjustment sliding window. The estimated 3D bounding boxes can provide more robust camera tracking and better scene understanding, and better map merging.
The fourth part of the thesis leverages the emerging feature learning framework. It proposes a unified self-supervised model called LGDNet to generate both global and local image feature descriptors end-to-end. Global feature descriptors embed the whole image into a compact representation, leading to easier scene comparison. On the other hand, local features focus more on the local region similarities of some salient parts for structure from motion. Our proposed method can directly extract features together with descriptors that encode both local maximum responses and global context information, avoiding duplicate calculations based on different feature extraction criteria. |
---|