Enhancing robustness and efficiency in visual SLAM through integration of deep learning-based semantic segmentation techniques
Visual SLAM is a robotics system enabling the traversal of robots in new environments without prior information. With the camera as its main sensor, it achieves the aforementioned objective through localisation and mapping algorithms based on visual information. To counter the vulnerability of...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175555 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Visual SLAM is a robotics system enabling the traversal of robots in new environments without prior
information. With the camera as its main sensor, it achieves the aforementioned objective through
localisation and mapping algorithms based on visual information. To counter the vulnerability of
visual SLAM to dynamic elements, Semantic SLAM imbues visual SLAM systems with semantic
information from semantic machine learning models. This Final Year Project delves into the field of
Semantic SLAM to discover and counter its unique challenges. Two main methods targeting separate
groups of challenges are conceptualized and evaluated.
The first method tackles the inflexibility in the use of semantic labels by Semantic SLAM. This
method develops a combined moving probability which incorporates both semantic and geometric
information. The combined probability allows for a more precise estimate of the probability of a
feature point moving within a scene. The method achieved aggregate improvements across low
dynamic scenes for both global and local accuracies. Most significantly, there was an overall 12.5%
improvement in local translational accuracies over traditional Semantic SLAM. In low-dynamic
scenes with faster and less stable camera movements, the combined probability method allows for
significant improvements of 17 to 22% in local translational and rotational accuracies. In addition,
improvements in low-dynamic scenes are well balanced against performance in high-dynamic scenes.
This is exemplified through global improvements of 92.5% over traditional SLAM. This means that in
high-dynamic scenes, the average global error is kept to a maximum of 13 centimetres as opposed to
101 centimetres for traditional ORB-SLAM. Furthermore, this method operates in real-time by
relocating the execution of semantic segmentation to a separate thread.
The second method targets the lack of precision in segmentation boundaries of semantic
segmentation models while aiming for further minimization of semantic SLAM execution times for
RGB-D Semantic SLAM. The method developed is a clustering-to-classification pipeline which splits
the responsibility of segmentation and classification between the components of clustering and
classification. This method is developed with the aim to replace the traditional use of semantic
segmentation models. Although gains in segmentation and classification are limited, the method
shows potential for more specialized use cases. |
---|