Alleviating the inconsistency of multimodal data in cross-modal retrieval

With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modaliti...

Full description

Saved in:
Bibliographic Details
Main Authors: Li, Tieying, Yang, Xiaochun, Ke, Yiping, Wang, Bin, Liu, Yinan, Xu, Jiaxing
Other Authors: College of Computing and Data Science
Format: Conference or Workshop Item
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/180605
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-180605
record_format dspace
spelling sg-ntu-dr.10356-1806052024-10-15T02:48:02Z Alleviating the inconsistency of multimodal data in cross-modal retrieval Li, Tieying Yang, Xiaochun Ke, Yiping Wang, Bin Liu, Yinan Xu, Jiaxing College of Computing and Data Science School of Computer Science and Engineering 2024 IEEE 40th International Conference on Data Engineering (ICDE) Computer and Information Science Disentangled hash learning Cross-modal retrieval Cross-modal hashing Label refinement Cross-modal common semantic alignment With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modalities and labels, which often do not hold in real-world data. We introduce two types of inconsistency: Modality-Modality (M-M) and Modality-Label (M-L) inconsistencies. We further validate the prevalent existence of inconsistent data in multimodal datasets and highlight it will reduce the accuracy of existing Cross-Modal retrieval methods. In this paper, we propose a novel framework called Inconsistency Alleviated Cross-Modal Retrieval (IA-CMR), addressing challenges posed by these inconsistencies. We first utilize two forms of contrastive learning loss and a mutual exclusion constraint to effectively disentangle modal information into modality-common hash codes and modality-unique hash codes. Our dedicated design in modality disentanglement is capable of alleviating the M-M inconsistency. Subsequently, we refine common labels through a label refinement loss and employ a Cross-modal Common Semantic Alignment module for effective alignment. The label refinement process and the CCSA module collectively handle the M-L inconsistency issue. IA-CMR outperforms 9 comparison baselines on two benchmark multimodal datasets, achieving an improvement in retrieval accuracy of up to 25.13%. The results confirm the effectiveness of IA-CMR in alleviating inconsistency and enhancing cross-modal retrieval performance. Ministry of Education (MOE) National Research Foundation (NRF) Submitted/Accepted version The work is partially supported by the National Natural Science Foundation of China (Nos. U22A2025, 62072088, 62232007, U23A20309), Liaoning Provincial Science and Technology Plan Project - Key R&D Department of Science and Technology (No.2023JH2/101300182), and the Ministry of Education, Singapore under its MOE Academic Research Fund Tier 2 (STEM RIE2025 Award MOE-T2EP20220-0006), and the National Research Foundation, Singapore under its Industry Alignment Fund – Pre-positioning (IAF-PP) Funding Initiative. 2024-10-15T02:48:02Z 2024-10-15T02:48:02Z 2024 Conference Paper Li, T., Yang, X., Ke, Y., Wang, B., Liu, Y. & Xu, J. (2024). Alleviating the inconsistency of multimodal data in cross-modal retrieval. 2024 IEEE 40th International Conference on Data Engineering (ICDE), 4643-4656. https://dx.doi.org/10.1109/ICDE60146.2024.00353 9798350317152 2375-026X https://hdl.handle.net/10356/180605 10.1109/ICDE60146.2024.00353 2-s2.0-85200502291 4643 4656 en MOE-T2EP20220-0006 SDSC-2020-004 IAF-PP © 2024 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/ICDE60146.2024.00353. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Disentangled hash learning
Cross-modal retrieval
Cross-modal hashing
Label refinement
Cross-modal common semantic alignment
spellingShingle Computer and Information Science
Disentangled hash learning
Cross-modal retrieval
Cross-modal hashing
Label refinement
Cross-modal common semantic alignment
Li, Tieying
Yang, Xiaochun
Ke, Yiping
Wang, Bin
Liu, Yinan
Xu, Jiaxing
Alleviating the inconsistency of multimodal data in cross-modal retrieval
description With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modalities and labels, which often do not hold in real-world data. We introduce two types of inconsistency: Modality-Modality (M-M) and Modality-Label (M-L) inconsistencies. We further validate the prevalent existence of inconsistent data in multimodal datasets and highlight it will reduce the accuracy of existing Cross-Modal retrieval methods. In this paper, we propose a novel framework called Inconsistency Alleviated Cross-Modal Retrieval (IA-CMR), addressing challenges posed by these inconsistencies. We first utilize two forms of contrastive learning loss and a mutual exclusion constraint to effectively disentangle modal information into modality-common hash codes and modality-unique hash codes. Our dedicated design in modality disentanglement is capable of alleviating the M-M inconsistency. Subsequently, we refine common labels through a label refinement loss and employ a Cross-modal Common Semantic Alignment module for effective alignment. The label refinement process and the CCSA module collectively handle the M-L inconsistency issue. IA-CMR outperforms 9 comparison baselines on two benchmark multimodal datasets, achieving an improvement in retrieval accuracy of up to 25.13%. The results confirm the effectiveness of IA-CMR in alleviating inconsistency and enhancing cross-modal retrieval performance.
author2 College of Computing and Data Science
author_facet College of Computing and Data Science
Li, Tieying
Yang, Xiaochun
Ke, Yiping
Wang, Bin
Liu, Yinan
Xu, Jiaxing
format Conference or Workshop Item
author Li, Tieying
Yang, Xiaochun
Ke, Yiping
Wang, Bin
Liu, Yinan
Xu, Jiaxing
author_sort Li, Tieying
title Alleviating the inconsistency of multimodal data in cross-modal retrieval
title_short Alleviating the inconsistency of multimodal data in cross-modal retrieval
title_full Alleviating the inconsistency of multimodal data in cross-modal retrieval
title_fullStr Alleviating the inconsistency of multimodal data in cross-modal retrieval
title_full_unstemmed Alleviating the inconsistency of multimodal data in cross-modal retrieval
title_sort alleviating the inconsistency of multimodal data in cross-modal retrieval
publishDate 2024
url https://hdl.handle.net/10356/180605
_version_ 1814777778869895168