Fast and robust visual SLAM for dynamic environments

Autonomous mobile robots need to perform self-localization with respect to their environments in order to achieve safe navigation. The self-localization of these autonomous robots is usually jointly addressed as the problem of Simultaneous Localization and Mapping (SLAM). The focus of SLAM has recen...

Full description

Saved in:
Bibliographic Details
Main Author: Singh Gaurav
Other Authors: Lam Siew Kei
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/155251
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-155251
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computer applications
spellingShingle Engineering::Computer science and engineering::Computer applications
Singh Gaurav
Fast and robust visual SLAM for dynamic environments
description Autonomous mobile robots need to perform self-localization with respect to their environments in order to achieve safe navigation. The self-localization of these autonomous robots is usually jointly addressed as the problem of Simultaneous Localization and Mapping (SLAM). The focus of SLAM has recently shifted towards vision-based approaches, which can provide higher robustness as well as the ability to generate semantically-rich maps. However, visual SLAM (vSLAM) systems suffer from high computational complexity and are unable to deal with dynamic objects in the scene and changing scene conditions. This thesis aims to develop a vSLAM framework that is robust to such dynamic environments while being able to run efficiently on resource-constrained platforms. We first propose a real-time solution for Visual Odometry (VO) that achieves high pose accuracy. In particular, an efficient feature correspondence setup scheme is introduced to generate high-quality feature matches that are evenly distributed over the image. A new adaptive technique that rapidly and efficiently removes outliers is presented, which overcomes the computational complexity of existing outlier removal schemes. In addition, a new pose optimization step is introduced to mitigate problems that are associated with far features, which often lead to high residual errors. The proposed VO is evaluated on the popular KITTI dataset by comparing it with top-performing VO and vSLAM in terms of speed and accuracy. Results show that the proposed VO achieves the fastest speed compared to all the top-ranked VO and vSLAM systems on the KITTI leaderboard. The proposed VO is 47% faster than the state-of-the-art ORB-SLAM2 with comparable accuracy. Next, studies are undertaken to study the impact of dynamic objects on the accuracy of pose estimates. Our studies highlight the importance of distinguishing between motion states of potential moving objects for vSLAM in highly dynamic environments. We propose a semantic vSLAM framework to increase the robustness of existing vSLAM systems by accurately removing moving objects from the scene so that they will not contribute to pose estimation and mapping. information is fused with motion states of the scene via a probability framework to enable accurate and robust moving object extraction to retain the useful features for pose estimation and mapping. We performed extensive experiments on well-known datasets to show that the proposed technique outperforms existing vSLAM methods in complex indoor and outdoor environments, under various dynamic scenarios such as crowded scenes. In order to accelerate our semantic vSLAM framework on embedded platforms, we propose a lightweight keyframe-only semantic generation method. Our approach extracts semantics only on keyframes (i.e., frames with significant changes in image content), and semantic propagation is used to compensate for the changes in the intermediate frames. This is achieved by computing the dense transformation map using the available feature flow vectors. A novel motion state detection algorithm to compensate for the propagated semantics is employed to identify regions in the scene with high moving probability. This information is then fused with semantic cues using the previously proposed probability framework to retain the useful features for pose estimation and mapping. We implemented our semantic vSLAM framework on the embedded Jetson TX1 and performed extensive experiments on four well-known datasets to show that it can outperform existing vSLAM methods in complex indoor and outdoor environments under various dynamic scenarios. Finally, we extend our semantic vSLAM framework to long-term localization by enabling it to adapt to varying scene conditions. To achieve this, we increase the robustness of the loop detection (relocalization) task in vSLAM, by using global semantic structure descriptors, which are more stable than the conventional local features under changing scene conditions. We introduce a novel hierarchical loop detection method that relies on the global semantic structure descriptors to first identify a coarse location, which is further refined using local feature descriptor-based bag-of-words (BOW). In addition, semantic class-wise local BOW vocabulary trees are built to increase the descriptiveness of the vocabulary for within-class words. The experiments demonstrate that the proposed hierarchical loop detection method has significantly lower query times than existing state-of-the-art loop detection methods with enhanced recall rates at 100% precision. Furthermore, the proposed hierarchical loop detection does not require any offline training for vocabularies or places.
author2 Lam Siew Kei
author_facet Lam Siew Kei
Singh Gaurav
format Thesis-Doctor of Philosophy
author Singh Gaurav
author_sort Singh Gaurav
title Fast and robust visual SLAM for dynamic environments
title_short Fast and robust visual SLAM for dynamic environments
title_full Fast and robust visual SLAM for dynamic environments
title_fullStr Fast and robust visual SLAM for dynamic environments
title_full_unstemmed Fast and robust visual SLAM for dynamic environments
title_sort fast and robust visual slam for dynamic environments
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/155251
_version_ 1759058777640796160
spelling sg-ntu-dr.10356-1552512023-02-08T08:02:06Z Fast and robust visual SLAM for dynamic environments Singh Gaurav Lam Siew Kei School of Computer Science and Engineering Hardware & Embedded Systems Lab (HESL) ASSKLam@ntu.edu.sg Engineering::Computer science and engineering::Computer applications Autonomous mobile robots need to perform self-localization with respect to their environments in order to achieve safe navigation. The self-localization of these autonomous robots is usually jointly addressed as the problem of Simultaneous Localization and Mapping (SLAM). The focus of SLAM has recently shifted towards vision-based approaches, which can provide higher robustness as well as the ability to generate semantically-rich maps. However, visual SLAM (vSLAM) systems suffer from high computational complexity and are unable to deal with dynamic objects in the scene and changing scene conditions. This thesis aims to develop a vSLAM framework that is robust to such dynamic environments while being able to run efficiently on resource-constrained platforms. We first propose a real-time solution for Visual Odometry (VO) that achieves high pose accuracy. In particular, an efficient feature correspondence setup scheme is introduced to generate high-quality feature matches that are evenly distributed over the image. A new adaptive technique that rapidly and efficiently removes outliers is presented, which overcomes the computational complexity of existing outlier removal schemes. In addition, a new pose optimization step is introduced to mitigate problems that are associated with far features, which often lead to high residual errors. The proposed VO is evaluated on the popular KITTI dataset by comparing it with top-performing VO and vSLAM in terms of speed and accuracy. Results show that the proposed VO achieves the fastest speed compared to all the top-ranked VO and vSLAM systems on the KITTI leaderboard. The proposed VO is 47% faster than the state-of-the-art ORB-SLAM2 with comparable accuracy. Next, studies are undertaken to study the impact of dynamic objects on the accuracy of pose estimates. Our studies highlight the importance of distinguishing between motion states of potential moving objects for vSLAM in highly dynamic environments. We propose a semantic vSLAM framework to increase the robustness of existing vSLAM systems by accurately removing moving objects from the scene so that they will not contribute to pose estimation and mapping. information is fused with motion states of the scene via a probability framework to enable accurate and robust moving object extraction to retain the useful features for pose estimation and mapping. We performed extensive experiments on well-known datasets to show that the proposed technique outperforms existing vSLAM methods in complex indoor and outdoor environments, under various dynamic scenarios such as crowded scenes. In order to accelerate our semantic vSLAM framework on embedded platforms, we propose a lightweight keyframe-only semantic generation method. Our approach extracts semantics only on keyframes (i.e., frames with significant changes in image content), and semantic propagation is used to compensate for the changes in the intermediate frames. This is achieved by computing the dense transformation map using the available feature flow vectors. A novel motion state detection algorithm to compensate for the propagated semantics is employed to identify regions in the scene with high moving probability. This information is then fused with semantic cues using the previously proposed probability framework to retain the useful features for pose estimation and mapping. We implemented our semantic vSLAM framework on the embedded Jetson TX1 and performed extensive experiments on four well-known datasets to show that it can outperform existing vSLAM methods in complex indoor and outdoor environments under various dynamic scenarios. Finally, we extend our semantic vSLAM framework to long-term localization by enabling it to adapt to varying scene conditions. To achieve this, we increase the robustness of the loop detection (relocalization) task in vSLAM, by using global semantic structure descriptors, which are more stable than the conventional local features under changing scene conditions. We introduce a novel hierarchical loop detection method that relies on the global semantic structure descriptors to first identify a coarse location, which is further refined using local feature descriptor-based bag-of-words (BOW). In addition, semantic class-wise local BOW vocabulary trees are built to increase the descriptiveness of the vocabulary for within-class words. The experiments demonstrate that the proposed hierarchical loop detection method has significantly lower query times than existing state-of-the-art loop detection methods with enhanced recall rates at 100% precision. Furthermore, the proposed hierarchical loop detection does not require any offline training for vocabularies or places. Doctor of Philosophy 2022-02-15T02:57:20Z 2022-02-15T02:57:20Z 2022 Thesis-Doctor of Philosophy Singh Gaurav (2022). Fast and robust visual SLAM for dynamic environments. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155251 https://hdl.handle.net/10356/155251 10.32657/10356/155251 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University