Robust visual SLAM for autonomous vehicles in challenging environments
Autonomous vehicles such as UAVs and AGVs have received increasing attentions over the past decades due to a wide range of applications in many areas. To accomplish robotic tasks intelligently, Simultaneous Localization and Mapping (SLAM) is considered as a fundamental capability for mobile robots....
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/163428 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Autonomous vehicles such as UAVs and AGVs have received increasing attentions over the past decades due to a wide range of applications in many areas. To accomplish robotic tasks intelligently, Simultaneous Localization and Mapping (SLAM) is considered as a fundamental capability for mobile robots. The objective of SLAM is to simultaneously estimate the poses of robots and build a map of the unknown environments from the data of on-board sensors. Driven by the demand for low-cost and high-efficiency solutions, the development of autonomous localization and navigation capabilities for autonomous vehicles using on-board sensors has become a popular research topic in robotics community.
Benefiting from various low-cost and lightweight cameras off-the-shelf, vision-based SLAM, or visual SLAM, has played an important role in many robotic applications. A number of impressive visual SLAM systems have been proposed in the literature using different type of cameras, such as monocular SLAM, RGB-D SLAM, and stereo SLAM. These methods have demonstrated impressive performance in specific scenarios and environments. However, there remains open problems for visual SLAM in challenging environments. First, since most visual SLAM methods rely on features extracted from the environments to estimate the visual odometry of cameras, they may not perform well or even fail in low-textured environments with few features. Second, most of the existing visual SLAM methods heavily rely on a static world assumption and easily fail in dynamic environments involving moving objects. Third, the computational resources for small-scaled autonomous vehicles are often limited, hence the real-time performance needs to be taken into consideration. To address these issues, we aim to improve the robustness of visual SLAM for autonomous vehicles in challenging environments, and introduce both geometric and semantic methods.
For autonomous navigation and safe control of UAVs, it is essential to have accurate and reliable velocity and position estimation. However, due to limited computational power and payload, it is still challenging for autonomous operation of UAVs in complex environments. In the first part of the thesis, we propose a robust and efficient velocity estimation framework for MAVs in cluttered environment using a single downward-facing RGB-D camera. Our method is able to provide metric velocity estimation in three dimensions as well as yaw rate for MAVs without the fusion of additional sensors. Moreover, based on a fast optical flow computation method which does not rely on time-consuming feature detection and matching, our approach is able to run in real-time on MAVs.
Although camera alone can provide satisfactory localization for autonomous vehicles, its navigation capability in real scenarios still remains challenging due to unreliable depth measurement and inaccurate global map building. In many industrial scenarios, 2D LiDAR is another popular localization device due to its relatively low cost and high accuracy. However, it can only work on 2D plane. To enhance the capabilities and intelligence of traditional AGVs equipped with 2D LiDAR, we propose an integration framework to combine the advantages of camera and 2D LiDAR for robust navigation in warehouse environments. 2D LiDAR has the advantage in providing accurate occupancy map which is essential for path planning, while it is unable to detect obstacles in 3D space. To solve this, we propose an effective obstacle detection method in 3D space using an RGB-D camera which can be directly used for obstacle avoidance in 2D LiDAR map-based navigation.
To track the position of vehicles using on-board cameras, most of the existing visual SLAM algorithms focus on points, either by feature matching or direct alignment of pixels, while ignoring other common but valuable geometry primitives such as lines and planes in the scenes. In low-textured environments, it is often difficult to find a sufficient number of point features and as a consequence, the performance of such algorithms degrades. To take full advantage of available geometric information in the environment, we propose a multi-landmark SLAM framework in the third part of the thesis which combines point, line and plane features to benefit both tracking and mapping for autonomous vehicles in indoor environments. For the tracking part, we develop an optimization framework that integrates different geometric features extracted from an RGB-D camera. For the mapping part, we combine different features to build a structural map of the environment.
All the above proposed systems and many other state-of-the-art visual SLAM solutions implicitly assume a static environment, which means that there are no moving objects in the camera FoV and the estimated motion only comes from the moving cameras. In consequence, the performance of these systems may degrade or even fail when there are moving objects in the scene, such as persons or other vehicles. Therefore, in the next two parts of the thesis, we aim to improve the robustness of visual SLAM in dynamic environments. We first propose a geometric method using KMeans clustering to detect dynamic parts in images which does not require prior information about the moving objects. With the development of deep learning, we further explore the integration of semantic information for visual SLAM in dynamic environments. To reduce the computational cost, we only perform semantic segmentation on keyframes to remove known dynamic objects, and maintain a static map for robust camera tracking. In addition, the geometry module in the previous part is integrated to handle unknown moving objects. Our system is able to run in real-time on a low-power embedded platform and provide high localization accuracy in dynamic environments. |
---|