Efficient cross-modal video retrieval with meta-optimized frames

Cross-modal video retrieval aims to retrieve semantically relevant videos when given a textual query, and is one of the fundamental multimedia tasks. Most top-performing methods primarily leverage Vision Transformer (ViT) to extract video features [1]-[3]. However, they suffer from the high computat...

Full description

Saved in:

Bibliographic Details
Main Authors:	HAN, Ning, YANG, Xun, LIM, Ee-peng, CHEN, Hao, SUN, Qianru
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Cross-Modal Multimodal Video Compression Video Retrieval Databases and Information Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/9034 https://ink.library.smu.edu.sg/context/sis_research/article/10037/viewcontent/2210.08452v1_sv.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://ink.library.smu.edu.sg/sis_research/9034
https://ink.library.smu.edu.sg/context/sis_research/article/10037/viewcontent/2210.08452v1_sv.pdf

Efficient cross-modal video retrieval with meta-optimized frames

Internet

Similar Items