4D point cloud semantic segmentation
3D point cloud semantic segmentation is a fundamental scene understanding task. Typical 3D point cloud semantic segmentation approaches analyze the 3D information of LiDAR point clouds and predict the classes of every point in the point cloud scenes. However, existing 3D-based approaches still canno...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172100 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | 3D point cloud semantic segmentation is a fundamental scene understanding task. Typical 3D point cloud semantic segmentation approaches analyze the 3D information of LiDAR point clouds and predict the classes of every point in the point cloud scenes. However, existing 3D-based approaches still cannot fulfil the requirements of real-world applications in terms of accuracy. Considering that environments in the wild are dynamic, temporal information is an important clue for identifying dynamic objects and can potentially enhance 3D segmentation models. Therefore, 4D point cloud semantic segmentation is proposed to fully use the temporal and spatial information from 4D point clouds to enhance the performance of existing 3D works.
On the other hand, training an effective 4D or 3D segmentation model requires a huge amount of data while manual annotations of point clouds are expensive. Weakly supervised segmentation approaches on 4D point clouds are able to train the segmentation model with minimum annotation requirements. In this thesis, we study fully supervised 4D point cloud semantic segmentation and further weakly supervised segmentation methods on 4D point clouds.
4D point cloud segmentation recognizes the labels of every point in 3D point cloud sequences or 4D point clouds. The temporal information in 4D point clouds is crucial for robotic systems to recognize dynamic objects. However, the area of 4D point cloud segmentation is still under-investigated, and existing approaches suffer from low efficiency and performance. To address this problem, we propose a novel framework called SpSequenceNet that fuses information from previous frames to the target frame. This framework is presented in Chapter 3. The network is designed based on 3D sparse convolution and includes two novel modules: Cross-frame Global Attention (CGA) module and Cross-frame Local Interpolation module (CLI). These modules capture spatial and temporal information from previous frames to enhance the predictions of the current frame. CGA selects the important features in the target features with a global summary of the previous feature. CLI interpolates the features of local regions in the previous frame and enhances the features in the target frame. We observe that the overall improvement of SpSequenceNet is still not satisfactory.
In Chapter 4, we extend SpSequenceNet and enrol more information, i.e., the temporal variation information and the point-level detail information. Based on CLI, we design a temporal variation-aware interpolation to improve the performance of high-speed object segmentation. We also design a temporal voxel-point refinement module to refine the predictions with point-level information. Furthermore, in Chapter 5, we propose a novel module, FeatProp, to capture more temporal information.
To this end, we design three novel approaches to enhance the features of target frames by extracting different temporal information in the local regions and global regions. Experimental results demonstrate that our frameworks achieve superior performance in 4D semantic segmentation.
For weakly supervised segmentation on 4D point clouds, we first propose a new weakly training task with 0.001% initial annotations. This task is introduced in Chapter 6. Specifically, we divide 4D point cloud datasets into a series of 100-frame sequences. Then, we sample around 0.1\% of the points in the first frame of each sequence and annotate these points as initial annotations. In such a weak setting, the usage of huge amounts of unannotated frames is the core problem of approaching effective models. Hence, we propose a novel temporal-spatial framework called W4DTS to utilise the annotated frames for generating high-quality pseudo-labels in the unannotated frames. We train our models with the generated pseudo-labels. In W4DTS, we propose a temporal matching module to select the most confident points as the pseudo annotated points. We further use a spatial graph propagation module to propagate the label information of initial annotations and pseudo annotated points to the relevant point cloud frames and generate more pseudo labels. However, we observe that global label propagation tends to propagate noises and errors easily. What makes things worse is that those errors also generate more false pseudo labels in the next frame through the temporal matching module. In Chapter 7, we propose a novel approach, Progressive 4D Grouping (P4G), to improve the final model with higher pseudo label quality. P4G groups annotated and high confident unannotated points in each 3D point cloud sequence and generates high-quality pseudo labels with very sparse annotated points. To further improve our progressive 4D grouping approach, we design cross-frame contrastive learning and local consistency learning to enhance the quality of our 4D grouping. Our experimental results show that P4G achieves state-of-the-art performance. |
---|