Tackling background ambiguities in multi-class few-shot point cloud semantic segmentation

Few-shot point cloud semantic segmentation learns to segment novel classes with scarce labeled samples. Within an episode, a novel target class is defined by a few support samples with corresponding binary masks, where only the points of this class are labeled as foreground and others are regarded a...

Full description

Saved in:
Bibliographic Details
Main Authors: Lai, Lvlong, Chen, Jian, Zhang, Chi, Zhang, Zehong, Lin, Guosheng, Wu, Qingyao
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/163370
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Few-shot point cloud semantic segmentation learns to segment novel classes with scarce labeled samples. Within an episode, a novel target class is defined by a few support samples with corresponding binary masks, where only the points of this class are labeled as foreground and others are regarded as background. In the tasks involving multiple target classes, since the meanings of background are diverse for different target classes, background ambiguities appear: Some points labeled as background in one support sample may be of other target classes. It will result in incorrect guidance and damage model's segmentation performance. However, previous methods in the literature do not consider this problem. In this paper, we propose a simple yet effective approach to tackle background ambiguities, which adopts the entropy of predictions on query samples to the training objective function as an additional regularization. Besides, we design a feature transformation operation to reduce the feature differences between support and query samples. With our proposed approach, fine-tuning, a weak baseline method for few-shot segmentation, gains significant performance improvement (e.g., 7.48% and 7.04% in 2-way-1-shot and 3-way-1-shot tasks of S3DIS, respectively) and outperforms current state-of-the-art methods in all the task settings of S3DIS and ScanNet benchmark datasets.