Fast and robust visual SLAM for dynamic environments
Autonomous mobile robots need to perform self-localization with respect to their environments in order to achieve safe navigation. The self-localization of these autonomous robots is usually jointly addressed as the problem of Simultaneous Localization and Mapping (SLAM). The focus of SLAM has recen...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/155251 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-155251 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computer applications |
spellingShingle |
Engineering::Computer science and engineering::Computer applications Singh Gaurav Fast and robust visual SLAM for dynamic environments |
description |
Autonomous mobile robots need to perform self-localization with respect to their environments in order to achieve safe navigation. The self-localization of these autonomous robots is usually jointly addressed as the problem of Simultaneous Localization and Mapping (SLAM). The focus of SLAM has recently shifted towards vision-based approaches, which can provide higher robustness as well as the ability to generate semantically-rich maps. However, visual SLAM (vSLAM) systems suffer from high computational complexity and are unable to deal with dynamic objects in the scene and changing scene conditions. This thesis aims to develop a vSLAM framework that is robust to such dynamic environments while being able to run efficiently on resource-constrained platforms. We first propose a real-time solution for Visual Odometry (VO) that achieves high pose accuracy. In particular, an efficient feature correspondence setup scheme is introduced to generate high-quality feature matches that are evenly distributed over the image. A new adaptive technique that rapidly and efficiently removes outliers is presented, which overcomes the computational complexity of existing outlier removal schemes. In addition, a new pose optimization step is introduced to mitigate problems that are associated with far features, which often lead to high residual errors. The proposed VO is evaluated on the popular KITTI dataset by comparing it with top-performing VO and vSLAM in terms of speed and accuracy. Results show that the proposed VO achieves the fastest speed compared to all the top-ranked VO and vSLAM systems on the KITTI leaderboard. The proposed VO is 47% faster than the state-of-the-art ORB-SLAM2 with comparable accuracy. Next, studies are undertaken to study the impact of dynamic objects on the accuracy of pose estimates. Our studies highlight the importance of distinguishing between motion states of potential moving objects for vSLAM in highly dynamic environments. We propose a semantic vSLAM framework to increase the robustness of existing vSLAM systems by accurately removing moving objects from the scene so that they will not contribute to pose estimation and mapping. information is fused with motion states of the scene via a probability framework to enable accurate and robust moving object extraction to retain the useful features
for pose estimation and mapping. We performed extensive experiments on well-known datasets to show that the proposed technique outperforms existing vSLAM methods in complex indoor and outdoor environments, under various dynamic scenarios such as crowded scenes. In order to accelerate our semantic vSLAM framework on embedded platforms, we propose a lightweight keyframe-only semantic generation method. Our approach extracts semantics only on keyframes (i.e., frames with significant changes in image content), and semantic propagation is used to compensate for the changes in the intermediate frames. This is achieved by computing the dense transformation map using the available feature flow vectors. A novel motion state detection algorithm to compensate for the propagated semantics is employed to identify regions in the scene with high moving probability. This information is then fused with semantic cues using the previously proposed probability framework to retain the useful features for pose estimation and mapping. We implemented our semantic vSLAM framework on the embedded Jetson TX1 and performed extensive experiments on four well-known datasets to show that it can outperform existing vSLAM methods in complex indoor and outdoor environments under various dynamic scenarios. Finally, we extend our semantic vSLAM framework to long-term localization by enabling it to adapt to varying scene conditions. To achieve this, we increase the robustness of the loop detection (relocalization) task in vSLAM, by using global semantic structure descriptors, which are more stable than the conventional local features under changing scene conditions. We introduce a novel hierarchical loop detection method that relies on the global semantic structure descriptors to first identify a coarse location, which is further refined using local feature descriptor-based bag-of-words (BOW). In addition, semantic class-wise local BOW vocabulary trees are built to increase the descriptiveness of the vocabulary for within-class words. The experiments demonstrate that the proposed hierarchical loop detection method has significantly lower query times than existing state-of-the-art loop detection methods with enhanced recall rates at 100% precision. Furthermore, the proposed hierarchical loop detection does not require any offline training for vocabularies or places. |
author2 |
Lam Siew Kei |
author_facet |
Lam Siew Kei Singh Gaurav |
format |
Thesis-Doctor of Philosophy |
author |
Singh Gaurav |
author_sort |
Singh Gaurav |
title |
Fast and robust visual SLAM for dynamic environments |
title_short |
Fast and robust visual SLAM for dynamic environments |
title_full |
Fast and robust visual SLAM for dynamic environments |
title_fullStr |
Fast and robust visual SLAM for dynamic environments |
title_full_unstemmed |
Fast and robust visual SLAM for dynamic environments |
title_sort |
fast and robust visual slam for dynamic environments |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/155251 |
_version_ |
1759058777640796160 |
spelling |
sg-ntu-dr.10356-1552512023-02-08T08:02:06Z Fast and robust visual SLAM for dynamic environments Singh Gaurav Lam Siew Kei School of Computer Science and Engineering Hardware & Embedded Systems Lab (HESL) ASSKLam@ntu.edu.sg Engineering::Computer science and engineering::Computer applications Autonomous mobile robots need to perform self-localization with respect to their environments in order to achieve safe navigation. The self-localization of these autonomous robots is usually jointly addressed as the problem of Simultaneous Localization and Mapping (SLAM). The focus of SLAM has recently shifted towards vision-based approaches, which can provide higher robustness as well as the ability to generate semantically-rich maps. However, visual SLAM (vSLAM) systems suffer from high computational complexity and are unable to deal with dynamic objects in the scene and changing scene conditions. This thesis aims to develop a vSLAM framework that is robust to such dynamic environments while being able to run efficiently on resource-constrained platforms. We first propose a real-time solution for Visual Odometry (VO) that achieves high pose accuracy. In particular, an efficient feature correspondence setup scheme is introduced to generate high-quality feature matches that are evenly distributed over the image. A new adaptive technique that rapidly and efficiently removes outliers is presented, which overcomes the computational complexity of existing outlier removal schemes. In addition, a new pose optimization step is introduced to mitigate problems that are associated with far features, which often lead to high residual errors. The proposed VO is evaluated on the popular KITTI dataset by comparing it with top-performing VO and vSLAM in terms of speed and accuracy. Results show that the proposed VO achieves the fastest speed compared to all the top-ranked VO and vSLAM systems on the KITTI leaderboard. The proposed VO is 47% faster than the state-of-the-art ORB-SLAM2 with comparable accuracy. Next, studies are undertaken to study the impact of dynamic objects on the accuracy of pose estimates. Our studies highlight the importance of distinguishing between motion states of potential moving objects for vSLAM in highly dynamic environments. We propose a semantic vSLAM framework to increase the robustness of existing vSLAM systems by accurately removing moving objects from the scene so that they will not contribute to pose estimation and mapping. information is fused with motion states of the scene via a probability framework to enable accurate and robust moving object extraction to retain the useful features for pose estimation and mapping. We performed extensive experiments on well-known datasets to show that the proposed technique outperforms existing vSLAM methods in complex indoor and outdoor environments, under various dynamic scenarios such as crowded scenes. In order to accelerate our semantic vSLAM framework on embedded platforms, we propose a lightweight keyframe-only semantic generation method. Our approach extracts semantics only on keyframes (i.e., frames with significant changes in image content), and semantic propagation is used to compensate for the changes in the intermediate frames. This is achieved by computing the dense transformation map using the available feature flow vectors. A novel motion state detection algorithm to compensate for the propagated semantics is employed to identify regions in the scene with high moving probability. This information is then fused with semantic cues using the previously proposed probability framework to retain the useful features for pose estimation and mapping. We implemented our semantic vSLAM framework on the embedded Jetson TX1 and performed extensive experiments on four well-known datasets to show that it can outperform existing vSLAM methods in complex indoor and outdoor environments under various dynamic scenarios. Finally, we extend our semantic vSLAM framework to long-term localization by enabling it to adapt to varying scene conditions. To achieve this, we increase the robustness of the loop detection (relocalization) task in vSLAM, by using global semantic structure descriptors, which are more stable than the conventional local features under changing scene conditions. We introduce a novel hierarchical loop detection method that relies on the global semantic structure descriptors to first identify a coarse location, which is further refined using local feature descriptor-based bag-of-words (BOW). In addition, semantic class-wise local BOW vocabulary trees are built to increase the descriptiveness of the vocabulary for within-class words. The experiments demonstrate that the proposed hierarchical loop detection method has significantly lower query times than existing state-of-the-art loop detection methods with enhanced recall rates at 100% precision. Furthermore, the proposed hierarchical loop detection does not require any offline training for vocabularies or places. Doctor of Philosophy 2022-02-15T02:57:20Z 2022-02-15T02:57:20Z 2022 Thesis-Doctor of Philosophy Singh Gaurav (2022). Fast and robust visual SLAM for dynamic environments. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155251 https://hdl.handle.net/10356/155251 10.32657/10356/155251 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |