Alleviating the inconsistency of multimodal data in cross-modal retrieval

With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modaliti...

Full description

Saved in:

Bibliographic Details
Main Authors:	Li, Tieying, Yang, Xiaochun, Ke, Yiping, Wang, Bin, Liu, Yinan, Xu, Jiaxing
Other Authors:	College of Computing and Data Science
Format:	Conference or Workshop Item
Language:	English
Published:	2024
Subjects:	Computer and Information Science Disentangled hash learning Cross-modal retrieval Cross-modal hashing Label refinement Cross-modal common semantic alignment
Online Access:	https://hdl.handle.net/10356/180605
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-180605
record_format	dspace
spelling	sg-ntu-dr.10356-1806052024-10-15T02:48:02Z Alleviating the inconsistency of multimodal data in cross-modal retrieval Li, Tieying Yang, Xiaochun Ke, Yiping Wang, Bin Liu, Yinan Xu, Jiaxing College of Computing and Data Science School of Computer Science and Engineering 2024 IEEE 40th International Conference on Data Engineering (ICDE) Computer and Information Science Disentangled hash learning Cross-modal retrieval Cross-modal hashing Label refinement Cross-modal common semantic alignment With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modalities and labels, which often do not hold in real-world data. We introduce two types of inconsistency: Modality-Modality (M-M) and Modality-Label (M-L) inconsistencies. We further validate the prevalent existence of inconsistent data in multimodal datasets and highlight it will reduce the accuracy of existing Cross-Modal retrieval methods. In this paper, we propose a novel framework called Inconsistency Alleviated Cross-Modal Retrieval (IA-CMR), addressing challenges posed by these inconsistencies. We first utilize two forms of contrastive learning loss and a mutual exclusion constraint to effectively disentangle modal information into modality-common hash codes and modality-unique hash codes. Our dedicated design in modality disentanglement is capable of alleviating the M-M inconsistency. Subsequently, we refine common labels through a label refinement loss and employ a Cross-modal Common Semantic Alignment module for effective alignment. The label refinement process and the CCSA module collectively handle the M-L inconsistency issue. IA-CMR outperforms 9 comparison baselines on two benchmark multimodal datasets, achieving an improvement in retrieval accuracy of up to 25.13%. The results confirm the effectiveness of IA-CMR in alleviating inconsistency and enhancing cross-modal retrieval performance. Ministry of Education (MOE) National Research Foundation (NRF) Submitted/Accepted version The work is partially supported by the National Natural Science Foundation of China (Nos. U22A2025, 62072088, 62232007, U23A20309), Liaoning Provincial Science and Technology Plan Project - Key R&D Department of Science and Technology (No.2023JH2/101300182), and the Ministry of Education, Singapore under its MOE Academic Research Fund Tier 2 (STEM RIE2025 Award MOE-T2EP20220-0006), and the National Research Foundation, Singapore under its Industry Alignment Fund – Pre-positioning (IAF-PP) Funding Initiative. 2024-10-15T02:48:02Z 2024-10-15T02:48:02Z 2024 Conference Paper Li, T., Yang, X., Ke, Y., Wang, B., Liu, Y. & Xu, J. (2024). Alleviating the inconsistency of multimodal data in cross-modal retrieval. 2024 IEEE 40th International Conference on Data Engineering (ICDE), 4643-4656. https://dx.doi.org/10.1109/ICDE60146.2024.00353 9798350317152 2375-026X https://hdl.handle.net/10356/180605 10.1109/ICDE60146.2024.00353 2-s2.0-85200502291 4643 4656 en MOE-T2EP20220-0006 SDSC-2020-004 IAF-PP © 2024 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/ICDE60146.2024.00353. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Disentangled hash learning Cross-modal retrieval Cross-modal hashing Label refinement Cross-modal common semantic alignment
spellingShingle	Computer and Information Science Disentangled hash learning Cross-modal retrieval Cross-modal hashing Label refinement Cross-modal common semantic alignment Li, Tieying Yang, Xiaochun Ke, Yiping Wang, Bin Liu, Yinan Xu, Jiaxing Alleviating the inconsistency of multimodal data in cross-modal retrieval
description	With the explosive growth of multimodal Internet data, cross-modal hashing retrieval has become crucial for semantically searching instances across different modalities. However, existing cross-modal retrieval methods rely on assumptions of perfect consistency between modalities and between modalities and labels, which often do not hold in real-world data. We introduce two types of inconsistency: Modality-Modality (M-M) and Modality-Label (M-L) inconsistencies. We further validate the prevalent existence of inconsistent data in multimodal datasets and highlight it will reduce the accuracy of existing Cross-Modal retrieval methods. In this paper, we propose a novel framework called Inconsistency Alleviated Cross-Modal Retrieval (IA-CMR), addressing challenges posed by these inconsistencies. We first utilize two forms of contrastive learning loss and a mutual exclusion constraint to effectively disentangle modal information into modality-common hash codes and modality-unique hash codes. Our dedicated design in modality disentanglement is capable of alleviating the M-M inconsistency. Subsequently, we refine common labels through a label refinement loss and employ a Cross-modal Common Semantic Alignment module for effective alignment. The label refinement process and the CCSA module collectively handle the M-L inconsistency issue. IA-CMR outperforms 9 comparison baselines on two benchmark multimodal datasets, achieving an improvement in retrieval accuracy of up to 25.13%. The results confirm the effectiveness of IA-CMR in alleviating inconsistency and enhancing cross-modal retrieval performance.
author2	College of Computing and Data Science
author_facet	College of Computing and Data Science Li, Tieying Yang, Xiaochun Ke, Yiping Wang, Bin Liu, Yinan Xu, Jiaxing
format	Conference or Workshop Item
author	Li, Tieying Yang, Xiaochun Ke, Yiping Wang, Bin Liu, Yinan Xu, Jiaxing
author_sort	Li, Tieying
title	Alleviating the inconsistency of multimodal data in cross-modal retrieval
title_short	Alleviating the inconsistency of multimodal data in cross-modal retrieval
title_full	Alleviating the inconsistency of multimodal data in cross-modal retrieval
title_fullStr	Alleviating the inconsistency of multimodal data in cross-modal retrieval
title_full_unstemmed	Alleviating the inconsistency of multimodal data in cross-modal retrieval
title_sort	alleviating the inconsistency of multimodal data in cross-modal retrieval
publishDate	2024
url	https://hdl.handle.net/10356/180605
_version_	1814777778869895168

Alleviating the inconsistency of multimodal data in cross-modal retrieval

Similar Items