Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection

The use of complementary information, namely depth or thermal information, has shown its benefits to salient object detection (SOD) during recent years. However, the RGB-D or RGB-T SOD problems are currently only solved independently, and most of them directly extract and fuse raw features from...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Gao, Wei, Liao, Guibiao, Ma, Siwei, Li, Ge, Liang, Yongsheng, Lin, Weisi
مؤلفون آخرون:	School of Computer Science and Engineering
التنسيق:	مقال
اللغة:	English
منشور في:	2021
الموضوعات:	Engineering::Computer science and engineering Dynamic Cross-Modal Guided Mechanism RGBD/RGB-T Multi-Modal Data
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/150772
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-150772
record_format	dspace
spelling	sg-ntu-dr.10356-1507722021-12-09T05:48:54Z Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection Gao, Wei Liao, Guibiao Ma, Siwei Li, Ge Liang, Yongsheng Lin, Weisi School of Computer Science and Engineering Engineering::Computer science and engineering Dynamic Cross-Modal Guided Mechanism RGBD/RGB-T Multi-Modal Data The use of complementary information, namely depth or thermal information, has shown its benefits to salient object detection (SOD) during recent years. However, the RGB-D or RGB-T SOD problems are currently only solved independently, and most of them directly extract and fuse raw features from backbones. Such methods can be easily restricted by low-quality modality data and redundant cross-modal features. In this work, a unified end-to-end framework is designed to simultaneously analyze RGB-D and RGB-T SOD tasks. Specifically, to effectively tackle multi-modal features, we propose a novel multi-stage and multi-scale fusion network (MMNet), which consists of a crossmodal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the visual color stage doctrine in the human visual system (HVS), the proposed CMFM aims to explore important feature representations in feature response stage, and integrate them into cross-modal features in adversarial combination stage. Moreover, the proposed BMD learns the combination of multi-level cross-modal fused features to capture both local and global information of salient objects, and can further boost the multi-modal SOD performance. The proposed unified cross-modality feature analysis framework based on two-stage and multi-scale information fusion can be used for diverse multi-modal SOD tasks. Comprehensive experiments (∼92K image-pairs) demonstrate that the proposed method consistently outperforms the other 21 state-of-the-art methods on nine benchmark datasets. This validates that our proposed method can work well on diverse multi-modal SOD tasks with good generalization and robustness, and provides a good multimodal SOD benchmark. Accepted version This work was supported by Ministry of Science and Technology of China - Science and Technology Innovations 2030 (2019AAA0103501), Natural Science Foundation of China (61801303 and 62031013), Guangdong Basic and Applied Basic Research Foundation (2019A1515012031), and Shenzhen Science and Technology Plan Basic Research Project (JCYJ20190808161805519). 2021-12-09T05:48:54Z 2021-12-09T05:48:54Z 2021 Journal Article Gao, W., Liao, G., Ma, S., Li, G., Liang, Y. & Lin, W. (2021). Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection. IEEE Transactions On Circuits and Systems for Video Technology. https://dx.doi.org/10.1109/TCSVT.2021.3082939 1051-8215 https://hdl.handle.net/10356/150772 10.1109/TCSVT.2021.3082939 en IEEE Transactions on Circuits and Systems for Video Technology © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TCSVT.2021.3082939. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Dynamic Cross-Modal Guided Mechanism RGBD/RGB-T Multi-Modal Data
spellingShingle	Engineering::Computer science and engineering Dynamic Cross-Modal Guided Mechanism RGBD/RGB-T Multi-Modal Data Gao, Wei Liao, Guibiao Ma, Siwei Li, Ge Liang, Yongsheng Lin, Weisi Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection
description	The use of complementary information, namely depth or thermal information, has shown its benefits to salient object detection (SOD) during recent years. However, the RGB-D or RGB-T SOD problems are currently only solved independently, and most of them directly extract and fuse raw features from backbones. Such methods can be easily restricted by low-quality modality data and redundant cross-modal features. In this work, a unified end-to-end framework is designed to simultaneously analyze RGB-D and RGB-T SOD tasks. Specifically, to effectively tackle multi-modal features, we propose a novel multi-stage and multi-scale fusion network (MMNet), which consists of a crossmodal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the visual color stage doctrine in the human visual system (HVS), the proposed CMFM aims to explore important feature representations in feature response stage, and integrate them into cross-modal features in adversarial combination stage. Moreover, the proposed BMD learns the combination of multi-level cross-modal fused features to capture both local and global information of salient objects, and can further boost the multi-modal SOD performance. The proposed unified cross-modality feature analysis framework based on two-stage and multi-scale information fusion can be used for diverse multi-modal SOD tasks. Comprehensive experiments (∼92K image-pairs) demonstrate that the proposed method consistently outperforms the other 21 state-of-the-art methods on nine benchmark datasets. This validates that our proposed method can work well on diverse multi-modal SOD tasks with good generalization and robustness, and provides a good multimodal SOD benchmark.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Gao, Wei Liao, Guibiao Ma, Siwei Li, Ge Liang, Yongsheng Lin, Weisi
format	Article
author	Gao, Wei Liao, Guibiao Ma, Siwei Li, Ge Liang, Yongsheng Lin, Weisi
author_sort	Gao, Wei
title	Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection
title_short	Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection
title_full	Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection
title_fullStr	Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection
title_full_unstemmed	Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection
title_sort	unified information fusion network for multi-modal rgb-d and rgb-t salient object detection
publishDate	2021
url	https://hdl.handle.net/10356/150772
_version_	1718928718416576512

Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection

مواد مشابهة