Cross-modal graph with meta concepts for video captioning
Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention...
Saved in:
Main Authors: | Wang, Hao, Lin, Guosheng, Hoi, Steven C. H., Miao, Chunyan |
---|---|
Other Authors: | School of Computer Science and Engineering |
Format: | Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/162546 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
Cross-modal graph with meta concepts for video captioning
by: WANG, Hao, et al.
Published: (2022) -
PERSONALIZED VISUAL INFORMATION CAPTIONING
by: WU SHUANG
Published: (2023) -
A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
by: Liu, A.-A., et al.
Published: (2021) -
Semantic-filtered Soft-Split-Aware video captioning with audio-augmented feature
by: Xu, Yuecong, et al.
Published: (2021) -
Dynamic captioning: Video accessibility enhancement for hearing impairment
by: Hong, R., et al.
Published: (2013)