Fusion of multimodal embeddings for ad-hoc video search

The challenge of Ad-Hoc Video Search (AVS) originates from free-form (i.e., no pre-defined vocabulary) and freestyle (i.e., natural language) query description. Bridging the semantic gap between AVS queries and videos becomes highly difficult as evidenced from the low retrieval accuracy of AVS bench...

Full description

Saved in:

Bibliographic Details
Main Authors:	FRANCIS, Danny, NGUYEN, Phuong Anh, HUET, Benoit, NGO, Chong-wah
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2019
Subjects:	Deep learning Multimedia Multimodal embeddings Multimodal fusion Video search Databases and Information Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/6462 https://ink.library.smu.edu.sg/context/sis_research/article/7465/viewcontent/Francis_Fusion_of_Multimodal_Embeddings_for_Ad_Hoc_Video_Search_ICCVW_2019_paper.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Description
Summary:	The challenge of Ad-Hoc Video Search (AVS) originates from free-form (i.e., no pre-defined vocabulary) and freestyle (i.e., natural language) query description. Bridging the semantic gap between AVS queries and videos becomes highly difficult as evidenced from the low retrieval accuracy of AVS benchmarking in TRECVID. In this paper, we study a new method to fuse multimodal embeddings which have been derived based on completely disjoint datasets. This method is tested on two datasets for two distinct tasks: on MSR-VTT for unique video retrieval and on V3C1 for multiple videos retrieval.

Fusion of multimodal embeddings for ad-hoc video search

Similar Items