Learning multi-modal scale-aware attentions for efficient and robust road segmentation

Multi-modal fusion has proven to be beneficial to road segmentation in autonomous driving, where depth is commonly used as complementary data for RGB images to provide robust 3D geometry information. Existing methods adopt an encoder-decoder structure to fuse two modalities for segmentation through...

Full description

Saved in:

Bibliographic Details
Main Author:	Zhou, Yunjiao
Other Authors:	Xie Lihua
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Online Access:	https://hdl.handle.net/10356/159277
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-159277
record_format	dspace
spelling	sg-ntu-dr.10356-1592772023-07-04T17:51:54Z Learning multi-modal scale-aware attentions for efficient and robust road segmentation Zhou, Yunjiao Xie Lihua School of Electrical and Electronic Engineering ELHXIE@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems Multi-modal fusion has proven to be beneficial to road segmentation in autonomous driving, where depth is commonly used as complementary data for RGB images to provide robust 3D geometry information. Existing methods adopt an encoder-decoder structure to fuse two modalities for segmentation through encoding and concatenating high-level and low-level features. However, this leads to increasing semantic gaps not only among modalities, but also different scales, which are detrimental to road segmentation. To overcome this challenge and obtain robust features, we propose a Multi-modal Scale-aware Attention Network (MSAN), to fuse RGB and depth data effectively via a novel transformer-based cross-attention module, namely Multi-modal Scare-aware Transformer (MST), which fuses RGB-D features across multiple scales at the encoder stage. To better consolidate different scales of feature, we further propose a Scale-aware Attention Module (SAM) that captures channel-wise attention for cross-scale fusion. The two attention-based modules focus on exploring the complementarity of modalities and the different importance of scales to narrow the gaps for road segmentation. Extensive experiments demonstrate that our method achieves competitive segmentation performance at a low computational cost. Master of Science (Computer Control and Automation) 2022-06-14T02:17:35Z 2022-06-14T02:17:35Z 2022 Thesis-Master by Coursework Zhou, Y. (2022). Learning multi-modal scale-aware attentions for efficient and robust road segmentation. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/159277 https://hdl.handle.net/10356/159277 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	Engineering::Electrical and electronic engineering::Computer hardware, software and systems Zhou, Yunjiao Learning multi-modal scale-aware attentions for efficient and robust road segmentation
description	Multi-modal fusion has proven to be beneficial to road segmentation in autonomous driving, where depth is commonly used as complementary data for RGB images to provide robust 3D geometry information. Existing methods adopt an encoder-decoder structure to fuse two modalities for segmentation through encoding and concatenating high-level and low-level features. However, this leads to increasing semantic gaps not only among modalities, but also different scales, which are detrimental to road segmentation. To overcome this challenge and obtain robust features, we propose a Multi-modal Scale-aware Attention Network (MSAN), to fuse RGB and depth data effectively via a novel transformer-based cross-attention module, namely Multi-modal Scare-aware Transformer (MST), which fuses RGB-D features across multiple scales at the encoder stage. To better consolidate different scales of feature, we further propose a Scale-aware Attention Module (SAM) that captures channel-wise attention for cross-scale fusion. The two attention-based modules focus on exploring the complementarity of modalities and the different importance of scales to narrow the gaps for road segmentation. Extensive experiments demonstrate that our method achieves competitive segmentation performance at a low computational cost.
author2	Xie Lihua
author_facet	Xie Lihua Zhou, Yunjiao
format	Thesis-Master by Coursework
author	Zhou, Yunjiao
author_sort	Zhou, Yunjiao
title	Learning multi-modal scale-aware attentions for efficient and robust road segmentation
title_short	Learning multi-modal scale-aware attentions for efficient and robust road segmentation
title_full	Learning multi-modal scale-aware attentions for efficient and robust road segmentation
title_fullStr	Learning multi-modal scale-aware attentions for efficient and robust road segmentation
title_full_unstemmed	Learning multi-modal scale-aware attentions for efficient and robust road segmentation
title_sort	learning multi-modal scale-aware attentions for efficient and robust road segmentation
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/159277
_version_	1772825843179978752

Learning multi-modal scale-aware attentions for efficient and robust road segmentation

Similar Items