Sensor fusion for autonomous mobile robot

The sparsity of point clouds and lack of sufficient semantic information present significant challenges for existing LiDAR-only 3D detection methods, particularly in robotic applications that demand high accuracy and efficiency. To address point cloud sparsity, recent approaches have explored the co...

Full description

Saved in:
Bibliographic Details
Main Author: Yu, Zhuochen
Other Authors: Andy Khong W H
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181815
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The sparsity of point clouds and lack of sufficient semantic information present significant challenges for existing LiDAR-only 3D detection methods, particularly in robotic applications that demand high accuracy and efficiency. To address point cloud sparsity, recent approaches have explored the conversion of RGB images into virtual points through depth completion, enabling fusion with LiDAR data. While these methods improve point cloud density, they often introduce substantial computational overhead due to the high density of generated virtual points and do not fully exploit the rich semantic information from images. In this work, VKIFNet is introduced as an efficient multi-modal feature fusion framework designed to enhance 3D perception for robotic systems by integrating virtual key instances with LiDAR points across multiple stages. VKIFNet incor porates three core modules. First, SKIS (Semantic Key Instance Selection) is presented, which filters and preserves only essential virtual key instances while leveraging semantic information from virtual points. This approach significantly reduces computational demands and allows critical image-derived features to be retained in 3D space, crucial for efficient robotic operation. The second module is a new fusion technique called VIFF (Virtual Instance Focused Fusion), which performs multi-level fusion of Virtual Key Instances and LiDAR data in both BEV (Bird’s-Eye View) and 3D space. This fusion method enhances spatial awareness and ensures that both LiDAR and image-derived features contribute to a more robust understanding of the environment. Lastly, VIRA (Virtual-Instance-to-Real Attention) is introduced as a lightweight attention mechanism that utilizes the features of relevant LiDAR points to refine the virtual key instances with minimal computational overhead, optimizing the model for real-time robotic applications. VKIFNet demonstrates substantial improvements in detection performance on the KITTI and JRDB datasets, showcasing its potential for high-precision 3D perception in robotics.