Alleviating the inconsistency of multimodal data in cross-modal retrieval
With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modaliti...
Saved in:
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/180605 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-180605 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1806052024-10-15T02:48:02Z Alleviating the inconsistency of multimodal data in cross-modal retrieval Li, Tieying Yang, Xiaochun Ke, Yiping Wang, Bin Liu, Yinan Xu, Jiaxing College of Computing and Data Science School of Computer Science and Engineering 2024 IEEE 40th International Conference on Data Engineering (ICDE) Computer and Information Science Disentangled hash learning Cross-modal retrieval Cross-modal hashing Label refinement Cross-modal common semantic alignment With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modalities and labels, which often do not hold in real-world data. We introduce two types of inconsistency: Modality-Modality (M-M) and Modality-Label (M-L) inconsistencies. We further validate the prevalent existence of inconsistent data in multimodal datasets and highlight it will reduce the accuracy of existing Cross-Modal retrieval methods. In this paper, we propose a novel framework called Inconsistency Alleviated Cross-Modal Retrieval (IA-CMR), addressing challenges posed by these inconsistencies. We first utilize two forms of contrastive learning loss and a mutual exclusion constraint to effectively disentangle modal information into modality-common hash codes and modality-unique hash codes. Our dedicated design in modality disentanglement is capable of alleviating the M-M inconsistency. Subsequently, we refine common labels through a label refinement loss and employ a Cross-modal Common Semantic Alignment module for effective alignment. The label refinement process and the CCSA module collectively handle the M-L inconsistency issue. IA-CMR outperforms 9 comparison baselines on two benchmark multimodal datasets, achieving an improvement in retrieval accuracy of up to 25.13%. The results confirm the effectiveness of IA-CMR in alleviating inconsistency and enhancing cross-modal retrieval performance. Ministry of Education (MOE) National Research Foundation (NRF) Submitted/Accepted version The work is partially supported by the National Natural Science Foundation of China (Nos. U22A2025, 62072088, 62232007, U23A20309), Liaoning Provincial Science and Technology Plan Project - Key R&D Department of Science and Technology (No.2023JH2/101300182), and the Ministry of Education, Singapore under its MOE Academic Research Fund Tier 2 (STEM RIE2025 Award MOE-T2EP20220-0006), and the National Research Foundation, Singapore under its Industry Alignment Fund – Pre-positioning (IAF-PP) Funding Initiative. 2024-10-15T02:48:02Z 2024-10-15T02:48:02Z 2024 Conference Paper Li, T., Yang, X., Ke, Y., Wang, B., Liu, Y. & Xu, J. (2024). Alleviating the inconsistency of multimodal data in cross-modal retrieval. 2024 IEEE 40th International Conference on Data Engineering (ICDE), 4643-4656. https://dx.doi.org/10.1109/ICDE60146.2024.00353 9798350317152 2375-026X https://hdl.handle.net/10356/180605 10.1109/ICDE60146.2024.00353 2-s2.0-85200502291 4643 4656 en MOE-T2EP20220-0006 SDSC-2020-004 IAF-PP © 2024 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/ICDE60146.2024.00353. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Disentangled hash learning Cross-modal retrieval Cross-modal hashing Label refinement Cross-modal common semantic alignment |
spellingShingle |
Computer and Information Science Disentangled hash learning Cross-modal retrieval Cross-modal hashing Label refinement Cross-modal common semantic alignment Li, Tieying Yang, Xiaochun Ke, Yiping Wang, Bin Liu, Yinan Xu, Jiaxing Alleviating the inconsistency of multimodal data in cross-modal retrieval |
description |
With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modalities and labels, which often do not hold in real-world data. We introduce two types of inconsistency: Modality-Modality (M-M) and Modality-Label (M-L) inconsistencies. We further validate the prevalent existence of inconsistent data in multimodal datasets and highlight it will reduce the accuracy of existing Cross-Modal retrieval methods. In this paper, we propose a novel framework called Inconsistency Alleviated Cross-Modal Retrieval (IA-CMR), addressing challenges posed by these inconsistencies. We first utilize two forms of contrastive learning loss and a mutual exclusion constraint to effectively disentangle modal information into modality-common hash codes and modality-unique hash codes. Our dedicated design in modality disentanglement is capable of alleviating the M-M inconsistency. Subsequently, we refine common labels through a label refinement loss and employ a Cross-modal Common Semantic Alignment module for effective alignment. The label refinement process and the CCSA module collectively handle the M-L inconsistency issue. IA-CMR outperforms 9 comparison baselines on two benchmark multimodal datasets, achieving an improvement in retrieval accuracy of up to 25.13%. The results confirm the effectiveness of IA-CMR in alleviating inconsistency and enhancing cross-modal retrieval performance. |
author2 |
College of Computing and Data Science |
author_facet |
College of Computing and Data Science Li, Tieying Yang, Xiaochun Ke, Yiping Wang, Bin Liu, Yinan Xu, Jiaxing |
format |
Conference or Workshop Item |
author |
Li, Tieying Yang, Xiaochun Ke, Yiping Wang, Bin Liu, Yinan Xu, Jiaxing |
author_sort |
Li, Tieying |
title |
Alleviating the inconsistency of multimodal data in cross-modal retrieval |
title_short |
Alleviating the inconsistency of multimodal data in cross-modal retrieval |
title_full |
Alleviating the inconsistency of multimodal data in cross-modal retrieval |
title_fullStr |
Alleviating the inconsistency of multimodal data in cross-modal retrieval |
title_full_unstemmed |
Alleviating the inconsistency of multimodal data in cross-modal retrieval |
title_sort |
alleviating the inconsistency of multimodal data in cross-modal retrieval |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/180605 |
_version_ |
1814777778869895168 |