3D point clouds indoor object detection

Computer vision has become an essential research area in the artificial intelligence era. In the past years, a large amount of computer vision research has focused on 2D images. Compared with 2D images, 3D data has the advantage of providing 3D spatial geometric information, such as location, scale...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Zhuhang
Other Authors: Wen Bihan
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/168966
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-168966
record_format dspace
spelling sg-ntu-dr.10356-1689662023-07-04T15:14:06Z 3D point clouds indoor object detection Li, Zhuhang Wen Bihan School of Electrical and Electronic Engineering bihan.wen@ntu.edu.sg Engineering::Electrical and electronic engineering Computer vision has become an essential research area in the artificial intelligence era. In the past years, a large amount of computer vision research has focused on 2D images. Compared with 2D images, 3D data has the advantage of providing 3D spatial geometric information, such as location, scale and pose of the target, regardless of illumination and texture changes, etc. 3D object detection and recognition is the crucial technology for 3D scene understanding and has a very extensive application prospect in the fields of autonomous driving, intelligent robotics, AR & VR, remote sensing mapping, biomedicine, and other fields. 3D object detection has become a research hotspot in the field of 3D vision in recent years. In the scenario of indoor object detection by UAV with LIDAR, the UAV is required to give accurate detection results in a short time, whereas because the deep neural network model for point cloud target detection often has a large number of parameters and requires a long time for data pre-processing and model inference. In order to achieve a lightweight point cloud target detection model without changing the feature extraction capability of the model, an Improved Lightweight VoteNet model was proposed in this thesis. In the feature extraction part of the model, the model uses single-scale set abstraction (SSSA) instead of multi-scale grouping, which reduces the number of deep neural network parameters and computation, hence reducing the feature extraction time. At the same time, it avoids computing repetitively as well as saving computational resources and accelerating the convergence speed of the model. Additionally, the multi-layer feature jumping connection is added to SSSA to avoid the problem of weak feature extraction and missing detection of small targets due to sparse point clouds. The combined use of single-scale set abstraction (SSSA) and multi-layer feature jumping connection makes the network extremely lightweight while ensuring feature extraction capability. In the VoteNet aggregation part, the randomization problem of the initially selected aggregation points results in insufficient attention to critical clusters. In order to solve the problem, a dual-channel attention mechanism is proposed, which consists of a channel attention mechanism and a spatial attention mechanism in sequential order. The Improved Lightweight VoteNet model with dual-channel attention focuses on critical points and suppresses non-critical points in spatial domains and learns the importance of feature information of different channels. Finally, evaluate the benchmark model and Improved Lightweight VoteNet based on the SUNRGBD dataset. The mean average precision (mAP) of the Improved Lightweight VoteNet increased by 0.0035 and 0.0516 compared with the benchmark model when the IoU1threshold is 0.25 and 0.5, respectively. On the basis of the above research work, used RealSense L515 to obtain raw point cloud data and used Improved lightweight VoteNet to predict the point cloud data classes and their bounding boxes, and visualized the point cloud data and target detection results. Key words: Improved Lightweight VoteNet, single-scale set abstraction, multi-layer feature jumping connection, dual-channel attention mechanism Master of Science (Communications Engineering) 2023-06-26T02:21:15Z 2023-06-26T02:21:15Z 2023 Thesis-Master by Coursework Li, Z. (2023). 3D point clouds indoor object detection. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/168966 https://hdl.handle.net/10356/168966 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Li, Zhuhang
3D point clouds indoor object detection
description Computer vision has become an essential research area in the artificial intelligence era. In the past years, a large amount of computer vision research has focused on 2D images. Compared with 2D images, 3D data has the advantage of providing 3D spatial geometric information, such as location, scale and pose of the target, regardless of illumination and texture changes, etc. 3D object detection and recognition is the crucial technology for 3D scene understanding and has a very extensive application prospect in the fields of autonomous driving, intelligent robotics, AR & VR, remote sensing mapping, biomedicine, and other fields. 3D object detection has become a research hotspot in the field of 3D vision in recent years. In the scenario of indoor object detection by UAV with LIDAR, the UAV is required to give accurate detection results in a short time, whereas because the deep neural network model for point cloud target detection often has a large number of parameters and requires a long time for data pre-processing and model inference. In order to achieve a lightweight point cloud target detection model without changing the feature extraction capability of the model, an Improved Lightweight VoteNet model was proposed in this thesis. In the feature extraction part of the model, the model uses single-scale set abstraction (SSSA) instead of multi-scale grouping, which reduces the number of deep neural network parameters and computation, hence reducing the feature extraction time. At the same time, it avoids computing repetitively as well as saving computational resources and accelerating the convergence speed of the model. Additionally, the multi-layer feature jumping connection is added to SSSA to avoid the problem of weak feature extraction and missing detection of small targets due to sparse point clouds. The combined use of single-scale set abstraction (SSSA) and multi-layer feature jumping connection makes the network extremely lightweight while ensuring feature extraction capability. In the VoteNet aggregation part, the randomization problem of the initially selected aggregation points results in insufficient attention to critical clusters. In order to solve the problem, a dual-channel attention mechanism is proposed, which consists of a channel attention mechanism and a spatial attention mechanism in sequential order. The Improved Lightweight VoteNet model with dual-channel attention focuses on critical points and suppresses non-critical points in spatial domains and learns the importance of feature information of different channels. Finally, evaluate the benchmark model and Improved Lightweight VoteNet based on the SUNRGBD dataset. The mean average precision (mAP) of the Improved Lightweight VoteNet increased by 0.0035 and 0.0516 compared with the benchmark model when the IoU1threshold is 0.25 and 0.5, respectively. On the basis of the above research work, used RealSense L515 to obtain raw point cloud data and used Improved lightweight VoteNet to predict the point cloud data classes and their bounding boxes, and visualized the point cloud data and target detection results. Key words: Improved Lightweight VoteNet, single-scale set abstraction, multi-layer feature jumping connection, dual-channel attention mechanism
author2 Wen Bihan
author_facet Wen Bihan
Li, Zhuhang
format Thesis-Master by Coursework
author Li, Zhuhang
author_sort Li, Zhuhang
title 3D point clouds indoor object detection
title_short 3D point clouds indoor object detection
title_full 3D point clouds indoor object detection
title_fullStr 3D point clouds indoor object detection
title_full_unstemmed 3D point clouds indoor object detection
title_sort 3d point clouds indoor object detection
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/168966
_version_ 1772828514752397312