Compositional prompting video-language models to understand procedure in instructional videos
Instructional videos are very useful for completing complex daily tasks, which naturally contain abundant clip-narration pairs. Existing works for procedure understanding are keen on pretraining various video-language models with these pairs and then fine-tuning downstream classifiers and localizers...
Saved in:
Main Authors: | Hu, Guyue, He, Bin, Zhang, Hanwang |
---|---|
Other Authors: | School of Computer Science and Engineering |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/168985 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
ClusterPrompt: Cluster semantic enhanced prompt learning for new intent discovery
by: LIANG, Jinggui, et al.
Published: (2023) -
Computer aided instruction on mobile video communications fundamentals
by: Makilan, Jessen Marc A., et al.
Published: (2008) -
A prompt-based topic-modeling method for depression detection on low-resource data
by: GUO, Yanrong, et al.
Published: (2024) -
Annotating videos that teach MS Excel and predicting mouse / keyboard actions
by: Tan, Genson Yao Jie
Published: (2024) -
MultiGPrompt for multi-task pre-training and prompting on graphs
by: YU, Xingtong, et al.
Published: (2024)