Concept-driven multi-modality fusion for video search

As it is true for human perception that we gather information from different sources in natural and multi-modality forms, learning from multi-modalities has become an effective scheme for various information retrieval problems. In this paper, we propose a novel multi-modality fusion approach for vid...

Full description

Saved in:
Bibliographic Details
Main Authors: WEI, Xiao-Yong, JIANG, Yu-Gang, NGO, Chong-wah
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2011
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6317
https://ink.library.smu.edu.sg/context/sis_research/article/7320/viewcontent/tcsvt10_xiaoyong.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7320
record_format dspace
spelling sg-smu-ink.sis_research-73202021-11-23T05:14:10Z Concept-driven multi-modality fusion for video search WEI, Xiao-Yong JIANG, Yu-Gang NGO, Chong-wah As it is true for human perception that we gather information from different sources in natural and multi-modality forms, learning from multi-modalities has become an effective scheme for various information retrieval problems. In this paper, we propose a novel multi-modality fusion approach for video search, where the search modalities are derived from a diverse set of knowledge sources, such as text transcript from speech recognition, low-level visual features from video frames, and high-level semantic visual concepts from supervised learning. Since the effectiveness of each search modality greatly depends on specific user queries, prompt determination of the importance of a modality to a user query is a critical issue in multi-modality search. Our proposed approach, named concept-driven multimodality fusion (CDMF), explores a large set of predefined semantic concepts for computing multi-modality fusion weights in a novel way. Specifically, in CDMF, we decompose the query-modality relationship into two components that are much easier to compute: query-concept relatedness and concept-modality relevancy. The former can be efficiently estimated online using semantic and visual mapping techniques, while the latter can be computed offline based on concept detection accuracy of each modality. Such a decomposition facilitates the need of adaptive learning of fusion weights for each user query on-the-fly, in contrast to the existing approaches which mostly adopted predefined query classes and/or modality weights. Experimental results on TREC video-retrieval evaluation 2005-2008 dataset validate the effectiveness of our approach, which outperforms the existing multi-modality fusion methods and achieves near-optimal performance (from oracle fusion) for many test queries. 2011-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6317 info:doi/10.1109/TCSVT.2011.2105597 https://ink.library.smu.edu.sg/context/sis_research/article/7320/viewcontent/tcsvt10_xiaoyong.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Concept-driven fusion multi-modality semantic concept video search Data Storage Systems Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Concept-driven fusion
multi-modality
semantic concept
video search
Data Storage Systems
Graphics and Human Computer Interfaces
spellingShingle Concept-driven fusion
multi-modality
semantic concept
video search
Data Storage Systems
Graphics and Human Computer Interfaces
WEI, Xiao-Yong
JIANG, Yu-Gang
NGO, Chong-wah
Concept-driven multi-modality fusion for video search
description As it is true for human perception that we gather information from different sources in natural and multi-modality forms, learning from multi-modalities has become an effective scheme for various information retrieval problems. In this paper, we propose a novel multi-modality fusion approach for video search, where the search modalities are derived from a diverse set of knowledge sources, such as text transcript from speech recognition, low-level visual features from video frames, and high-level semantic visual concepts from supervised learning. Since the effectiveness of each search modality greatly depends on specific user queries, prompt determination of the importance of a modality to a user query is a critical issue in multi-modality search. Our proposed approach, named concept-driven multimodality fusion (CDMF), explores a large set of predefined semantic concepts for computing multi-modality fusion weights in a novel way. Specifically, in CDMF, we decompose the query-modality relationship into two components that are much easier to compute: query-concept relatedness and concept-modality relevancy. The former can be efficiently estimated online using semantic and visual mapping techniques, while the latter can be computed offline based on concept detection accuracy of each modality. Such a decomposition facilitates the need of adaptive learning of fusion weights for each user query on-the-fly, in contrast to the existing approaches which mostly adopted predefined query classes and/or modality weights. Experimental results on TREC video-retrieval evaluation 2005-2008 dataset validate the effectiveness of our approach, which outperforms the existing multi-modality fusion methods and achieves near-optimal performance (from oracle fusion) for many test queries.
format text
author WEI, Xiao-Yong
JIANG, Yu-Gang
NGO, Chong-wah
author_facet WEI, Xiao-Yong
JIANG, Yu-Gang
NGO, Chong-wah
author_sort WEI, Xiao-Yong
title Concept-driven multi-modality fusion for video search
title_short Concept-driven multi-modality fusion for video search
title_full Concept-driven multi-modality fusion for video search
title_fullStr Concept-driven multi-modality fusion for video search
title_full_unstemmed Concept-driven multi-modality fusion for video search
title_sort concept-driven multi-modality fusion for video search
publisher Institutional Knowledge at Singapore Management University
publishDate 2011
url https://ink.library.smu.edu.sg/sis_research/6317
https://ink.library.smu.edu.sg/context/sis_research/article/7320/viewcontent/tcsvt10_xiaoyong.pdf
_version_ 1770575933375250432