Cross-modal graph with meta concepts for video captioning
Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention...
Saved in:
Main Authors: | WANG, Hao, LIN, Guosheng, HOI, Steven C. H., MIAO, Chunyan |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7245 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Similar Items
-
Cross-modal graph with meta concepts for video captioning
by: Wang, Hao, et al.
Published: (2022) -
PERSONALIZED VISUAL INFORMATION CAPTIONING
by: WU SHUANG
Published: (2023) -
Learning transferable perturbations for image captioning
by: WU, Hanjie, et al.
Published: (2022) -
More is better : precise and detailed image captioning using online positive recall and missing concepts mining
by: Zhang, Mingxing, et al.
Published: (2020) -
A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
by: Liu, A.-A., et al.
Published: (2021)