Cross-modal graph with meta concepts for video captioning

Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention...

Full description

Saved in:
Bibliographic Details
Main Authors: WANG, Hao, LIN, Guosheng, HOI, Steven C. H., MIAO, Chunyan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7245
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English