Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection

RGB-Thermal Salient Object Detection (RGB-T SOD) aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images. A key challenge lies in bridging the inherent disparities between RGB and Thermal modalities for effective saliency map prediction. Traditional encoder-dec...

Full description

Saved in:
Bibliographic Details
Main Authors: TANG, Hao, LI, Zechao, ZHANG, Dong, HE, Shengfeng, TANG, Jinhui
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9905
https://ink.library.smu.edu.sg/context/sis_research/article/10905/viewcontent/Divide_and_Conquer_av.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10905
record_format dspace
spelling sg-smu-ink.sis_research-109052025-01-02T08:52:19Z Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection TANG, Hao LI, Zechao ZHANG, Dong HE, Shengfeng TANG, Jinhui RGB-Thermal Salient Object Detection (RGB-T SOD) aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images. A key challenge lies in bridging the inherent disparities between RGB and Thermal modalities for effective saliency map prediction. Traditional encoder-decoder architectures, while designed for cross-modality feature interactions, may not have adequately considered the robustness against noise originating from defective modalities, thereby leading to suboptimal performance in complex scenarios. Inspired by hierarchical human visual systems, we propose the ConTriNet, a robust Confluent Triple-Flow Network employing a "Divide-and-Conquer"strategy. This framework utilizes a unified encoder with specialized decoders, each addressing different subtasks of exploring modality-specific and modality-complementary information for RGB-T SOD, thereby enhancing the final saliency map prediction. Specifically, ConTriNet comprises three flows: two modality-specific flows explore cues from RGB and Thermal modalities, and a third modality-complementary flow integrates cues from both modalities. ConTriNet presents several notable advantages. It incorporates a Modality-induced Feature Modulator (MFM) in the modality-shared union encoder to minimize inter-modality discrepancies and mitigate the impact of defective samples. Additionally, a foundational Residual Atrous Spatial Pyramid Module (RASPM) in the separated flows enlarges the receptive field, allowing for the capture of multi-scale contextual information. Furthermore, a Modality-aware Dynamic Aggregation Module (MDAM) in the modality-complementary flow dynamically aggregates saliency-related cues from both modality-specific flows. Leveraging the proposed parallel triple-flow framework, we further refine saliency maps derived from different flows through a flow-cooperative fusion strategy, yielding a high-quality, full-resolution saliency map for the final prediction. To evaluate the robustness and stability of our approach, we collect a comprehensive RGB-T SOD benchmark, VT-IMAG, covering various real-world challenging scenarios. Extensive experiments on public benchmarks and our VT-IMAG dataset demonstrate that ConTriNet consistently outperforms state-of-the-art competitors in both common and challenging scenarios, even when dealing with incomplete modality data. 2024-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9905 info:doi/10.1109/TPAMI.2024.3511621 https://ink.library.smu.edu.sg/context/sis_research/article/10905/viewcontent/Divide_and_Conquer_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Encoder-Decoder multi-modal fusion RGB-thermal salient object detection Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Encoder-Decoder
multi-modal fusion
RGB-thermal
salient object detection
Artificial Intelligence and Robotics
Numerical Analysis and Scientific Computing
spellingShingle Encoder-Decoder
multi-modal fusion
RGB-thermal
salient object detection
Artificial Intelligence and Robotics
Numerical Analysis and Scientific Computing
TANG, Hao
LI, Zechao
ZHANG, Dong
HE, Shengfeng
TANG, Jinhui
Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection
description RGB-Thermal Salient Object Detection (RGB-T SOD) aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images. A key challenge lies in bridging the inherent disparities between RGB and Thermal modalities for effective saliency map prediction. Traditional encoder-decoder architectures, while designed for cross-modality feature interactions, may not have adequately considered the robustness against noise originating from defective modalities, thereby leading to suboptimal performance in complex scenarios. Inspired by hierarchical human visual systems, we propose the ConTriNet, a robust Confluent Triple-Flow Network employing a "Divide-and-Conquer"strategy. This framework utilizes a unified encoder with specialized decoders, each addressing different subtasks of exploring modality-specific and modality-complementary information for RGB-T SOD, thereby enhancing the final saliency map prediction. Specifically, ConTriNet comprises three flows: two modality-specific flows explore cues from RGB and Thermal modalities, and a third modality-complementary flow integrates cues from both modalities. ConTriNet presents several notable advantages. It incorporates a Modality-induced Feature Modulator (MFM) in the modality-shared union encoder to minimize inter-modality discrepancies and mitigate the impact of defective samples. Additionally, a foundational Residual Atrous Spatial Pyramid Module (RASPM) in the separated flows enlarges the receptive field, allowing for the capture of multi-scale contextual information. Furthermore, a Modality-aware Dynamic Aggregation Module (MDAM) in the modality-complementary flow dynamically aggregates saliency-related cues from both modality-specific flows. Leveraging the proposed parallel triple-flow framework, we further refine saliency maps derived from different flows through a flow-cooperative fusion strategy, yielding a high-quality, full-resolution saliency map for the final prediction. To evaluate the robustness and stability of our approach, we collect a comprehensive RGB-T SOD benchmark, VT-IMAG, covering various real-world challenging scenarios. Extensive experiments on public benchmarks and our VT-IMAG dataset demonstrate that ConTriNet consistently outperforms state-of-the-art competitors in both common and challenging scenarios, even when dealing with incomplete modality data.
format text
author TANG, Hao
LI, Zechao
ZHANG, Dong
HE, Shengfeng
TANG, Jinhui
author_facet TANG, Hao
LI, Zechao
ZHANG, Dong
HE, Shengfeng
TANG, Jinhui
author_sort TANG, Hao
title Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection
title_short Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection
title_full Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection
title_fullStr Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection
title_full_unstemmed Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T salient object detection
title_sort divide-and-conquer: confluent triple-flow network for rgb-t salient object detection
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9905
https://ink.library.smu.edu.sg/context/sis_research/article/10905/viewcontent/Divide_and_Conquer_av.pdf
_version_ 1821237281194770432