3D point clouds indoor object detection

Computer vision has become an essential research area in the artificial intelligence era. In the past years, a large amount of computer vision research has focused on 2D images. Compared with 2D images, 3D data has the advantage of providing 3D spatial geometric information, such as location, scale...

Full description

Saved in:

Bibliographic Details
Main Author:	Li, Zhuhang
Other Authors:	Wen Bihan
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/168966
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-168966
record_format	dspace
spelling	sg-ntu-dr.10356-1689662023-07-04T15:14:06Z 3D point clouds indoor object detection Li, Zhuhang Wen Bihan School of Electrical and Electronic Engineering bihan.wen@ntu.edu.sg Engineering::Electrical and electronic engineering Computer vision has become an essential research area in the artificial intelligence era. In the past years, a large amount of computer vision research has focused on 2D images. Compared with 2D images, 3D data has the advantage of providing 3D spatial geometric information, such as location, scale and pose of the target, regardless of illumination and texture changes, etc. 3D object detection and recognition is the crucial technology for 3D scene understanding and has a very extensive application prospect in the fields of autonomous driving, intelligent robotics, AR & VR, remote sensing mapping, biomedicine, and other fields. 3D object detection has become a research hotspot in the field of 3D vision in recent years. In the scenario of indoor object detection by UAV with LIDAR, the UAV is required to give accurate detection results in a short time, whereas because the deep neural network model for point cloud target detection often has a large number of parameters and requires a long time for data pre-processing and model inference. In order to achieve a lightweight point cloud target detection model without changing the feature extraction capability of the model, an Improved Lightweight VoteNet model was proposed in this thesis. In the feature extraction part of the model, the model uses single-scale set abstraction (SSSA) instead of multi-scale grouping, which reduces the number of deep neural network parameters and computation, hence reducing the feature extraction time. At the same time, it avoids computing repetitively as well as saving computational resources and accelerating the convergence speed of the model. Additionally, the multi-layer feature jumping connection is added to SSSA to avoid the problem of weak feature extraction and missing detection of small targets due to sparse point clouds. The combined use of single-scale set abstraction (SSSA) and multi-layer feature jumping connection makes the network extremely lightweight while ensuring feature extraction capability. In the VoteNet aggregation part, the randomization problem of the initially selected aggregation points results in insufficient attention to critical clusters. In order to solve the problem, a dual-channel attention mechanism is proposed, which consists of a channel attention mechanism and a spatial attention mechanism in sequential order. The Improved Lightweight VoteNet model with dual-channel attention focuses on critical points and suppresses non-critical points in spatial domains and learns the importance of feature information of different channels. Finally, evaluate the benchmark model and Improved Lightweight VoteNet based on the SUNRGBD dataset. The mean average precision (mAP) of the Improved Lightweight VoteNet increased by 0.0035 and 0.0516 compared with the benchmark model when the IoU1threshold is 0.25 and 0.5, respectively. On the basis of the above research work, used RealSense L515 to obtain raw point cloud data and used Improved lightweight VoteNet to predict the point cloud data classes and their bounding boxes, and visualized the point cloud data and target detection results. Key words: Improved Lightweight VoteNet, single-scale set abstraction, multi-layer feature jumping connection, dual-channel attention mechanism Master of Science (Communications Engineering) 2023-06-26T02:21:15Z 2023-06-26T02:21:15Z 2023 Thesis-Master by Coursework Li, Z. (2023). 3D point clouds indoor object detection. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/168966 https://hdl.handle.net/10356/168966 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Electrical and electronic engineering Li, Zhuhang 3D point clouds indoor object detection
description	Computer vision has become an essential research area in the artificial intelligence era. In the past years, a large amount of computer vision research has focused on 2D images. Compared with 2D images, 3D data has the advantage of providing 3D spatial geometric information, such as location, scale and pose of the target, regardless of illumination and texture changes, etc. 3D object detection and recognition is the crucial technology for 3D scene understanding and has a very extensive application prospect in the fields of autonomous driving, intelligent robotics, AR & VR, remote sensing mapping, biomedicine, and other fields. 3D object detection has become a research hotspot in the field of 3D vision in recent years. In the scenario of indoor object detection by UAV with LIDAR, the UAV is required to give accurate detection results in a short time, whereas because the deep neural network model for point cloud target detection often has a large number of parameters and requires a long time for data pre-processing and model inference. In order to achieve a lightweight point cloud target detection model without changing the feature extraction capability of the model, an Improved Lightweight VoteNet model was proposed in this thesis. In the feature extraction part of the model, the model uses single-scale set abstraction (SSSA) instead of multi-scale grouping, which reduces the number of deep neural network parameters and computation, hence reducing the feature extraction time. At the same time, it avoids computing repetitively as well as saving computational resources and accelerating the convergence speed of the model. Additionally, the multi-layer feature jumping connection is added to SSSA to avoid the problem of weak feature extraction and missing detection of small targets due to sparse point clouds. The combined use of single-scale set abstraction (SSSA) and multi-layer feature jumping connection makes the network extremely lightweight while ensuring feature extraction capability. In the VoteNet aggregation part, the randomization problem of the initially selected aggregation points results in insufficient attention to critical clusters. In order to solve the problem, a dual-channel attention mechanism is proposed, which consists of a channel attention mechanism and a spatial attention mechanism in sequential order. The Improved Lightweight VoteNet model with dual-channel attention focuses on critical points and suppresses non-critical points in spatial domains and learns the importance of feature information of different channels. Finally, evaluate the benchmark model and Improved Lightweight VoteNet based on the SUNRGBD dataset. The mean average precision (mAP) of the Improved Lightweight VoteNet increased by 0.0035 and 0.0516 compared with the benchmark model when the IoU1threshold is 0.25 and 0.5, respectively. On the basis of the above research work, used RealSense L515 to obtain raw point cloud data and used Improved lightweight VoteNet to predict the point cloud data classes and their bounding boxes, and visualized the point cloud data and target detection results. Key words: Improved Lightweight VoteNet, single-scale set abstraction, multi-layer feature jumping connection, dual-channel attention mechanism
author2	Wen Bihan
author_facet	Wen Bihan Li, Zhuhang
format	Thesis-Master by Coursework
author	Li, Zhuhang
author_sort	Li, Zhuhang
title	3D point clouds indoor object detection
title_short	3D point clouds indoor object detection
title_full	3D point clouds indoor object detection
title_fullStr	3D point clouds indoor object detection
title_full_unstemmed	3D point clouds indoor object detection
title_sort	3d point clouds indoor object detection
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/168966
_version_	1772828514752397312

3D point clouds indoor object detection

Similar Items