Hybrid SLAM and object recognition on an embedded platform
Simultaneous Localization and Mapping (SLAM) is a key localisation technique for systems such as autonomous agents. Visual SLAM is a subset of SLAM which analyzes visual information captured by cameras using visual SLAM algorithms, enabling the estimation of the camera's position and orie...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/171979 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Simultaneous Localization and Mapping (SLAM) is a key localisation technique for systems
such as autonomous agents. Visual SLAM is a subset of SLAM which analyzes visual
information captured by cameras using visual SLAM algorithms, enabling the estimation of
the camera's position and orientation while simultaneously constructing a map of the
environment.
Conventional visual SLAM mainly uses sparse point clouds which fails to capture a
comprehensive representation of the environment. Thus, the maps created lacks semantic
information of the environment which causes limitations in scene understanding.
Autonomous agents that utilize visual SLAM are also commonly implemented on embedded
systems. As such, the objective of this project is to implement a hybrid SLAM system on an
embedded platform which incorporates conventional SLAM algorithms with semantic
segmentation to construct semantic dense maps.
The hybrid SLAM system is implemented and tested on the NVIDIA Jetson Xavier NX
embedded system together with the ZED2 Stereo camera for input. ORB SLAM3, a well established visual SLAM algorithm was chosen for estimating the camera poses and
orientation. The SCALE semantic segmentation is used to perform semantic segmentation
on the input provided by the ZED 2 camera. Kimera semantics is then used for processing
the camera poses, semantic and depth images to generate a semantic dense map. |
---|