Cross-modal graph with meta concepts for video captioning

Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wang, Hao, Lin, Guosheng, Hoi, Steven C. H., Miao, Chunyan
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2022
Subjects:	Engineering::Computer science and engineering Video Captioning Vision-and-Language
Online Access:	https://hdl.handle.net/10356/162546
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Internet

https://hdl.handle.net/10356/162546

Cross-modal graph with meta concepts for video captioning

Internet

Similar Items