Cross-modal graph with meta concepts for video captioning
Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention...
Saved in:
Main Authors: | , , , |
---|---|
其他作者: | |
格式: | Article |
語言: | English |
出版: |
2022
|
主題: | |
在線閱讀: | https://hdl.handle.net/10356/162546 |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
機構: | Nanyang Technological University |
語言: | English |