Cross-modal graph with meta concepts for video captioning

Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention...

全面介紹

Saved in:

書目詳細資料
Main Authors:	Wang, Hao, Lin, Guosheng, Hoi, Steven C. H., Miao, Chunyan
其他作者:	School of Computer Science and Engineering
格式:	Article
語言:	English
出版:	2022
主題:	Engineering::Computer science and engineering Video Captioning Vision-and-Language
在線閱讀:	https://hdl.handle.net/10356/162546
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

因特網

https://hdl.handle.net/10356/162546

Cross-modal graph with meta concepts for video captioning

因特網

相似書籍