Cross-modal graph with meta concepts for video captioning

Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention...

全面介紹

Saved in:
書目詳細資料
Main Authors: Wang, Hao, Lin, Guosheng, Hoi, Steven C. H., Miao, Chunyan
其他作者: School of Computer Science and Engineering
格式: Article
語言:English
出版: 2022
主題:
在線閱讀:https://hdl.handle.net/10356/162546
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English