3D point clouds indoor object detection

Computer vision has become an essential research area in the artificial intelligence era. In the past years, a large amount of computer vision research has focused on 2D images. Compared with 2D images, 3D data has the advantage of providing 3D spatial geometric information, such as location, scale...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Zhuhang
Other Authors: Wen Bihan
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/168966
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Computer vision has become an essential research area in the artificial intelligence era. In the past years, a large amount of computer vision research has focused on 2D images. Compared with 2D images, 3D data has the advantage of providing 3D spatial geometric information, such as location, scale and pose of the target, regardless of illumination and texture changes, etc. 3D object detection and recognition is the crucial technology for 3D scene understanding and has a very extensive application prospect in the fields of autonomous driving, intelligent robotics, AR & VR, remote sensing mapping, biomedicine, and other fields. 3D object detection has become a research hotspot in the field of 3D vision in recent years. In the scenario of indoor object detection by UAV with LIDAR, the UAV is required to give accurate detection results in a short time, whereas because the deep neural network model for point cloud target detection often has a large number of parameters and requires a long time for data pre-processing and model inference. In order to achieve a lightweight point cloud target detection model without changing the feature extraction capability of the model, an Improved Lightweight VoteNet model was proposed in this thesis. In the feature extraction part of the model, the model uses single-scale set abstraction (SSSA) instead of multi-scale grouping, which reduces the number of deep neural network parameters and computation, hence reducing the feature extraction time. At the same time, it avoids computing repetitively as well as saving computational resources and accelerating the convergence speed of the model. Additionally, the multi-layer feature jumping connection is added to SSSA to avoid the problem of weak feature extraction and missing detection of small targets due to sparse point clouds. The combined use of single-scale set abstraction (SSSA) and multi-layer feature jumping connection makes the network extremely lightweight while ensuring feature extraction capability. In the VoteNet aggregation part, the randomization problem of the initially selected aggregation points results in insufficient attention to critical clusters. In order to solve the problem, a dual-channel attention mechanism is proposed, which consists of a channel attention mechanism and a spatial attention mechanism in sequential order. The Improved Lightweight VoteNet model with dual-channel attention focuses on critical points and suppresses non-critical points in spatial domains and learns the importance of feature information of different channels. Finally, evaluate the benchmark model and Improved Lightweight VoteNet based on the SUNRGBD dataset. The mean average precision (mAP) of the Improved Lightweight VoteNet increased by 0.0035 and 0.0516 compared with the benchmark model when the IoU1threshold is 0.25 and 0.5, respectively. On the basis of the above research work, used RealSense L515 to obtain raw point cloud data and used Improved lightweight VoteNet to predict the point cloud data classes and their bounding boxes, and visualized the point cloud data and target detection results. Key words: Improved Lightweight VoteNet, single-scale set abstraction, multi-layer feature jumping connection, dual-channel attention mechanism