Hybrid SLAM and object recognition on an embedded platform

Simultaneous Localization and Mapping (SLAM) is a key localisation technique for systems such as autonomous agents. Visual SLAM is a subset of SLAM which analyzes visual information captured by cameras using visual SLAM algorithms, enabling the estimation of the camera's position and orie...

Full description

Saved in:
Bibliographic Details
Main Author: Low, Timothy Zhi Hao
Other Authors: Lam Siew Kei
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171979
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Simultaneous Localization and Mapping (SLAM) is a key localisation technique for systems such as autonomous agents. Visual SLAM is a subset of SLAM which analyzes visual information captured by cameras using visual SLAM algorithms, enabling the estimation of the camera's position and orientation while simultaneously constructing a map of the environment. Conventional visual SLAM mainly uses sparse point clouds which fails to capture a comprehensive representation of the environment. Thus, the maps created lacks semantic information of the environment which causes limitations in scene understanding. Autonomous agents that utilize visual SLAM are also commonly implemented on embedded systems. As such, the objective of this project is to implement a hybrid SLAM system on an embedded platform which incorporates conventional SLAM algorithms with semantic segmentation to construct semantic dense maps. The hybrid SLAM system is implemented and tested on the NVIDIA Jetson Xavier NX embedded system together with the ZED2 Stereo camera for input. ORB SLAM3, a well established visual SLAM algorithm was chosen for estimating the camera poses and orientation. The SCALE semantic segmentation is used to perform semantic segmentation on the input provided by the ZED 2 camera. Kimera semantics is then used for processing the camera poses, semantic and depth images to generate a semantic dense map.