Compositional prompting video-language models to understand procedure in instructional videos

Compositional prompting video-language models to understand procedure in instructional videos

Instructional videos are very useful for completing complex daily tasks, which naturally contain abundant clip-narration pairs. Existing works for procedure understanding are keen on pretraining various video-language models with these pairs and then fine-tuning downstream classifiers and localizers...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hu, Guyue, He, Bin, Zhang, Hanwang
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering Prompt Learning Instructional Videos
Online Access:	https://hdl.handle.net/10356/168985
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

ClusterPrompt: Cluster semantic enhanced prompt learning for new intent discovery
by: LIANG, Jinggui, et al.
Published: (2023)

Computer aided instruction on mobile video communications fundamentals
by: Makilan, Jessen Marc A., et al.
Published: (2008)

A prompt-based topic-modeling method for depression detection on low-resource data
by: GUO, Yanrong, et al.
Published: (2024)

Annotating videos that teach MS Excel and predicting mouse / keyboard actions
by: Tan, Genson Yao Jie
Published: (2024)

MultiGPrompt for multi-task pre-training and prompting on graphs
by: YU, Xingtong, et al.
Published: (2024)

How people prompt generative AI to create interactive VR scenes
by: AGHEL MANESH, Setareh, et al.
Published: (2024)

S-prompts learning with pre-trained transformers: An Occam's razor for domain incremental learning
by: WANG, Yabin, et al.
Published: (2022)

Self-supervised video hashing with hierarchical binary auto-encoder
by: Song, Jingkuan, et al.
Published: (2020)

THE INFLUENCE OF DIFFERENT MODALITIES ON THE DESIGN OF PROMPTS IN A LOWER LIMB EXERGAME FOR COGNITIVELY IMPAIRED OLDER ADULTS
by: LIOW WEI TING
Published: (2021)

RELATION UNDERSTANDING IN VIDEOS
by: SHANG XINDI
Published: (2021)

Effects of bin proximity and informational prompts on recycling and contamination
by: ROSENTHAL, Sonny, et al.
Published: (2021)

Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing
by: Hanwang Zhang, et al.
Published: (2020)

Voucher abuse detection with prompt-based fine-tuning on graph neural networks
by: WEN, Zhihao, et al.
Published: (2023)

Delving into multimodal prompting for fine-grained visual classification
by: JIANG, Xin, et al.
Published: (2024)

Compositional prompt tuning with motion cues for open-vocabulary video relation detection
by: GAO, Kaifeng, et al.
Published: (2023)

Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model
by: Jingyuan Chen, et al.
Published: (2020)

Unbiased multiple instance learning for weakly supervised video anomaly detection
by: Lv, Hui, et al.
Published: (2023)

Multi-graph based active learning for interactive video retrieval
by: ZHANG XIAOMING
Published: (2010)

NoteVideo: Facilitating navigation of blackboard-style lecture videos
by: Monserrat, T.-J.K.P., et al.
Published: (2014)

Teaching internal control using a student-generated video project
by: SEOW, Poh Sun, et al.
Published: (2018)

Integrated framework for developing instructional videos for foundational computing courses
by: SHIM, Kyong Jin, et al.
Published: (2021)

TOWARDS ATTENTION-AWARE CONCEPT MAP BASED REVIEW IN VIDEO LEARNING
by: ZHANG SHAN
Published: (2023)

Multimodal distillation for egocentric video understanding
by: Peng, Han
Published: (2024)

Augmenting low-resource text classification with graph-grounded pre-training and prompting
by: WEN, Zhihao, et al.
Published: (2023)

Video quality for video analysis
by: PAVEL KORSHUNOV
Published: (2011)

L.IVE an integrated interactive video-based learning environment
by: Monserrat, T.-J.K., et al.
Published: (2014)

L.IVE: An integrated interactive video-based learning environment
by: Monserrat, T.-J.K., et al.
Published: (2014)

Stargazer: An interactive camera robot for capturing how-to videos based on subtle instructor cues
by: LI, Jiannan, et al.
Published: (2023)

Solution generation for university math problems using large language models
by: Wirja, Louis
Published: (2024)

VideoQA: Question answering on news video
by: Yang, H., et al.
Published: (2013)

DISCOV: A framework for discovering objects in video
by: Liu D., et al.
Published: (2018)

DeepQoE : a multimodal learning framework for video quality of experience (QoE) prediction
by: Zhang, Huaizheng, et al.
Published: (2021)

Temporal sentence grounding in videos: a survey and future directions
by: Zhang, Hao, et al.
Published: (2023)

Automatic parsing and indexing of news video
by: Zhang, H., et al.
Published: (2014)

Enhancing visual grounding in vision-language pre-training with position-guided text prompts
by: WANG, Alex Jinpeng, et al.
Published: (2024)

APPLICATION OF MULTIMEDIA IN E-LEARNING: LECTURE VIDEOS AND MULTIMODAL SYSTEMS
by: SUBHASREE BASU
Published: (2018)

Creating videos for lectures
by: Asis, Love
Published: (2021)

Paying attention to video object pattern understanding
by: WANG, Wenguan, et al.
Published: (2021)

Weakly supervised video anomaly detection and localization with spatio-temporal prompts
by: WU, Peng, et al.
Published: (2026)

Prompt tuning on Graph-Augmented Low-Resource text classification
by: WEN, Zhihao, et al.
Published: (2024)