Temporal sentence grounding in videos: a survey and future directions

Temporal sentence grounding in videos: a survey and future directions

Temporal sentence grounding in videos (TSGV), a.k.a., natural language video localization (NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that semantically corresponds to a language query from an untrimmed video. Connecting computer vision and natural language, TSGV has dr...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhang, Hao, Sun, Aixin, Jing, Wei, Zhou, Joey Tianyi
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering Cross-Modal Video Retrieval Multimodal Learning
Online Access:	https://hdl.handle.net/10356/172187
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

Efficient cross-modal video retrieval with meta-optimized frames
by: HAN, Ning, et al.
Published: (2024)

Towards semantic, debiased and moment video retrieval
by: Satar, Burak
Published: (2025)

Cross-modal Moment Localization in Videos
by: Meng Liu, et al.
Published: (2020)

Learning a cross-modal hashing network for multimedia search
by: Tan, Yap Peng, et al.
Published: (2018)

Towards temporal sentence grounding in videos
by: Zhang, Hao
Published: (2022)

Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
by: Jing-Jing Chen, et al.
Published: (2020)

Cross-modal recipe retrieval with stacked attention model
by: CHEN, Jing-Jing, et al.
Published: (2018)

Attentive Moment Retrieval in Videos
by: Meng Liu, et al.
Published: (2020)

Probabilistic temporal multimedia data mining
by: Bhatt, C., et al.
Published: (2013)

Alleviating the inconsistency of multimodal data in cross-modal retrieval
by: Li, Tieying, et al.
Published: (2024)

VRAG: Region Attention Graph for Content-Based Video Retrieval
by: KENNARD NG POOL HUA
Published: (2021)

Learning language to symbol and language to vision mapping for visual grounding
by: He, Su, et al.
Published: (2022)

VisionGo: Bridging users and multimedia video retrieval
by: Neo, S.-Y., et al.
Published: (2013)

Content-based video retrieval: Three example systems from TRECVid
by: Smeaton, A.F., et al.
Published: (2013)

Cross-modal Recipe Retrieval with Rich Food Attributes
by: Jingjing Chen, et al.
Published: (2020)

Cross-modal recipe retrieval: How to cook this dish?
by: CHEN, Jingjing, et al.
Published: (2017)

Self-supervised video hashing with hierarchical binary auto-encoder
by: Song, Jingkuan, et al.
Published: (2020)

Contrastive video question answering via video graph transformer
by: XIAO, Junbin Xiao, et al.
Published: (2023)

GEO-REFERENCED VIDEO RETRIEVAL: TEXT ANNOTATION AND SIMILARITY SEARCH
by: YIN YIFANG
Published: (2016)

Deep multimodal learning for affective analysis and retrieval
by: PANG, Lei, et al.
Published: (2015)

Near-duplicate video retrieval: Current research and future trends
by: LIU, Jiajun, et al.
Published: (2013)

VideoQA: Question answering on news video
by: Yang, H., et al.
Published: (2013)

Automatic parsing and indexing of news video
by: Zhang, H., et al.
Published: (2014)

An integrated system for content-based video retrieval and browsing
by: Zhang, H.J., et al.
Published: (2014)

Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing
by: Hanwang Zhang, et al.
Published: (2020)

A multimodal and multilevel ranking framework for content-based video retrieval
by: HOI, Steven C. H., et al.
Published: (2007)

A multimodal and multilevel ranking framework for content-based video retrieval
by: HOI, Steven C. H., et al.
Published: (2007)

An online video recommendation framework using rich information
by: Zhao, X., et al.
Published: (2013)

APPLICATION OF MULTIMEDIA IN E-LEARNING: LECTURE VIDEOS AND MULTIMODAL SYSTEMS
by: SUBHASREE BASU
Published: (2018)

Video retrieval based on object discovery
by: Liu D., et al.
Published: (2018)

CONQUER: Contextual query-aware ranking for video corpus moment retrieval
by: HOU, Zhijian, et al.
Published: (2021)

Stratification approach to modeling video
by: Chua, T.-S., et al.
Published: (2013)

Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
by: WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha, et al.
Published: (2024)

Beyond ranking loss : deep holographic networks for multi-label video search
by: Chen, Zhuo, et al.
Published: (2020)

Layered video delivery from multiple servers
by: Tan, Y.H., et al.
Published: (2016)

CUHK at imageCLEF 2005: cross-language and cross-media image retrieval
by: HOI, Steven C. H., et al.
Published: (2005)

A multimodal and multilevel ranking scheme for large-scale video retrieval
by: HOI, Steven C. H., et al.
Published: (2008)

News video search with fuzzy event clustering using high-level features
by: Neo, S.-Y., et al.
Published: (2013)

MMGCN: Multimodal Graph Convolution Network for Personalized Recommendation of Micro-video
by: Yinwei Wei, et al.
Published: (2020)

Triadic temporal-semantic alignment for weakly-supervised video moment retrieval
by: LIU, Jin, et al.
Published: (2024)