MPT-Net: mask point transformer network for large scale point cloud semantic segmentation

Point cloud semantic segmentation is important for road scene perception, a task for driverless vehicles to achieve full fledged autonomy. In this work, we introduce Mask Point Transformer Network (MPT-Net), a novel architecture for point cloud segmentation that is simple to implement. MPT-Net consi...

Full description

Saved in:
Bibliographic Details
Main Authors: Tang, Zhe Jun, Cham, Tat-Jen
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172661
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-172661
record_format dspace
spelling sg-ntu-dr.10356-1726612023-12-19T05:23:12Z MPT-Net: mask point transformer network for large scale point cloud semantic segmentation Tang, Zhe Jun Cham, Tat-Jen School of Computer Science and Engineering 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Point Cloud Compression Representation Learning Point cloud semantic segmentation is important for road scene perception, a task for driverless vehicles to achieve full fledged autonomy. In this work, we introduce Mask Point Transformer Network (MPT-Net), a novel architecture for point cloud segmentation that is simple to implement. MPT-Net consists of a local and global feature encoder and a transformer based decoder; a 3D Point-Voxel Convolution encoder backbone with voxel self attention to encode features and a Mask Point Transformer module to decode point features and segment the point cloud. Firstly, we introduce the novel MPT designed to specifically handle point cloud segmentation. MPT offers two benefits. It attends to every point in the point cloud using mask tokens to extract class specific features globally with cross attention, and provide inter-class feature information exchange using self attention on the learned mask tokens. Secondly, we design a backbone to use sparse point voxel convolutional blocks and a self attention block using transformers to learn local and global contextual features. We evaluate MPT-Net on large scale outdoor driving scene point cloud datasets, SemanticKITTI and nuScenes. Our experiments show that by replacing the standard segmentation head with MPT, MPT-Net achieves a state-of-the-art performance over our baseline approach by 3.8% in SemanticKITTI and is highly effective in detecting 'stuffs' in point cloud. 2023-12-19T05:23:12Z 2023-12-19T05:23:12Z 2022 Conference Paper Tang, Z. J. & Cham, T. (2022). MPT-Net: mask point transformer network for large scale point cloud semantic segmentation. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 10611-10618. https://dx.doi.org/10.1109/IROS47612.2022.9981809 9781665479271 https://hdl.handle.net/10356/172661 10.1109/IROS47612.2022.9981809 2-s2.0-85146358620 10611 10618 en © 2022 IEEE. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Point Cloud Compression
Representation Learning
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Point Cloud Compression
Representation Learning
Tang, Zhe Jun
Cham, Tat-Jen
MPT-Net: mask point transformer network for large scale point cloud semantic segmentation
description Point cloud semantic segmentation is important for road scene perception, a task for driverless vehicles to achieve full fledged autonomy. In this work, we introduce Mask Point Transformer Network (MPT-Net), a novel architecture for point cloud segmentation that is simple to implement. MPT-Net consists of a local and global feature encoder and a transformer based decoder; a 3D Point-Voxel Convolution encoder backbone with voxel self attention to encode features and a Mask Point Transformer module to decode point features and segment the point cloud. Firstly, we introduce the novel MPT designed to specifically handle point cloud segmentation. MPT offers two benefits. It attends to every point in the point cloud using mask tokens to extract class specific features globally with cross attention, and provide inter-class feature information exchange using self attention on the learned mask tokens. Secondly, we design a backbone to use sparse point voxel convolutional blocks and a self attention block using transformers to learn local and global contextual features. We evaluate MPT-Net on large scale outdoor driving scene point cloud datasets, SemanticKITTI and nuScenes. Our experiments show that by replacing the standard segmentation head with MPT, MPT-Net achieves a state-of-the-art performance over our baseline approach by 3.8% in SemanticKITTI and is highly effective in detecting 'stuffs' in point cloud.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Tang, Zhe Jun
Cham, Tat-Jen
format Conference or Workshop Item
author Tang, Zhe Jun
Cham, Tat-Jen
author_sort Tang, Zhe Jun
title MPT-Net: mask point transformer network for large scale point cloud semantic segmentation
title_short MPT-Net: mask point transformer network for large scale point cloud semantic segmentation
title_full MPT-Net: mask point transformer network for large scale point cloud semantic segmentation
title_fullStr MPT-Net: mask point transformer network for large scale point cloud semantic segmentation
title_full_unstemmed MPT-Net: mask point transformer network for large scale point cloud semantic segmentation
title_sort mpt-net: mask point transformer network for large scale point cloud semantic segmentation
publishDate 2023
url https://hdl.handle.net/10356/172661
_version_ 1787136509705453568