Efficient cross-modal video retrieval with meta-optimized frames

Efficient cross-modal video retrieval with meta-optimized frames

Cross-modal video retrieval aims to retrieve semantically relevant videos when given a textual query, and is one of the fundamental multimedia tasks. Most top-performing methods primarily leverage Vision Transformer (ViT) to extract video features [1]-[3]. However, they suffer from the high computat...

Full description

Saved in:

Bibliographic Details
Main Authors:	HAN, Ning, YANG, Xun, LIM, Ee-peng, CHEN, Hao, SUN, Qianru
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Cross-Modal Multimodal Video Compression Video Retrieval Databases and Information Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/9034 https://ink.library.smu.edu.sg/context/sis_research/article/10037/viewcontent/2210.08452v1_sv.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Towards semantic, debiased and moment video retrieval
by: Satar, Burak
Published: (2025)

Cross-modal Moment Localization in Videos
by: Meng Liu, et al.
Published: (2020)

Temporal sentence grounding in videos: a survey and future directions
by: Zhang, Hao, et al.
Published: (2023)

Cross-modal recipe retrieval with stacked attention model
by: CHEN, Jing-Jing, et al.
Published: (2018)

Cross-modal recipe retrieval: How to cook this dish?
by: CHEN, Jingjing, et al.
Published: (2017)

Tree-augmented cross-modal encoding for complex-query video retrieval
by: YANG, Xun, et al.
Published: (2020)

Recent advances in content-based video analysis
by: NGO, Chong-wah, et al.
Published: (2001)

Deep understanding of cooking procedure for cross-modal recipe retrieval
by: CHEN, Jingjing, et al.
Published: (2018)

VRAG: Region Attention Graph for Content-Based Video Retrieval
by: KENNARD NG POOL HUA
Published: (2021)

Content-based video retrieval: Three example systems from TRECVid
by: Smeaton, A.F., et al.
Published: (2013)

An integrated system for content-based video retrieval and browsing
by: Zhang, H.J., et al.
Published: (2014)

Mix-DANN and dynamic-modal-distillation for video domain adaptation
by: YIN, Yuehao, et al.
Published: (2022)

VisionGo: Bridging users and multimedia video retrieval
by: Neo, S.-Y., et al.
Published: (2013)

Contrastive video question answering via video graph transformer
by: XIAO, Junbin Xiao, et al.
Published: (2023)

Vireo @ video browser showdown 2019
by: NGUYEN, Phuong Anh, et al.
Published: (2019)

Alleviating the inconsistency of multimodal data in cross-modal retrieval
by: Li, Tieying, et al.
Published: (2024)

Cross-modal recipe retrieval with rich food attributes
by: CHEN, Jingjing, et al.
Published: (2017)

Video retrieval based on object discovery
by: Liu D., et al.
Published: (2018)

Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
by: Jing-Jing Chen, et al.
Published: (2020)

Learning a cross-modal hashing network for multimedia search
by: Tan, Yap Peng, et al.
Published: (2018)

Cross-modal Recipe Retrieval with Rich Food Attributes
by: Jingjing Chen, et al.
Published: (2020)

Cross-domain cross-modal food transfer
by: ZHU, Bin, et al.
Published: (2020)

Deep multimodal learning for affective analysis and retrieval
by: PANG, Lei, et al.
Published: (2015)

Approach for video retrieval by video clip
by: PENG, Y., et al.
Published: (2003)

VIREO @ Video Browser Showdown 2020
by: NGUYEN, Phuong Anh, et al.
Published: (2020)

CONQUER: Contextual query-aware ranking for video corpus moment retrieval
by: HOU, Zhijian, et al.
Published: (2021)

Clip-based similarity measure for hierarchical video retrieval
by: PENG, Yuxin, et al.
Published: (2004)

Interactive search vs. automatic search: An extensive study on video retrieval
by: NGUYEN, Phuong-Anh, et al.
Published: (2021)

Measuring novelty and redundancy with multiple modalities in cross-lingual broadcast news
by: WU, Xiao, et al.
Published: (2008)

Neighbourhood structure preserving cross-modal embedding for video hyperlinking
by: HAO, Yanbin, et al.
Published: (2020)

Concept-driven multi-modality fusion for video search
by: WEI, Xiao-Yong, et al.
Published: (2011)

Unsupervised video hashing with multi-granularity contextualization and multi-structure preservation
by: HAO, Yanbin, et al.
Published: (2022)

Improvement of error concealment technique for H.264 scalable video coding
by: Simon Jude Que Lam
Published: (2012)

Watching 360° videos together
by: TANG, Anthony, et al.
Published: (2017)

R2GAN: Cross-modal recipe retrieval with generative adversarial network
by: ZHU, Bin, et al.
Published: (2019)

GEO-REFERENCED VIDEO RETRIEVAL: TEXT ANNOTATION AND SIMILARITY SEARCH
by: YIN YIFANG
Published: (2016)

Automatic parsing and indexing of news video
by: Zhang, H., et al.
Published: (2014)

A modified video coding algorithm based on the H.261 standard
by: Tan, Jocelyn Arlene Y., et al.
Published: (2003)

Fusion of multimodal embeddings for ad-hoc video search
by: FRANCIS, Danny, et al.
Published: (2019)

Triadic temporal-semantic alignment for weakly-supervised video moment retrieval
by: LIU, Jin, et al.
Published: (2024)