Stack-VS : stacked visual-semantic attention for image caption generation
Recently, automatic image caption generation has been an important focus of the work on multimodal translation task. Existing approaches can be roughly categorized into two classes, top-down and bottom-up, the former transfers the image information (called as visual-level feature) directly into a ca...
Saved in:
Main Authors: | Cheng, Ling, Wei, Wei, Mao, Xianling, Liu, Yong, Miao, Chunyan |
---|---|
Other Authors: | School of Computer Science and Engineering |
Format: | Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/148460 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
PERSONALIZED VISUAL INFORMATION CAPTIONING
by: WU SHUANG
Published: (2023) -
Image captioning via semantic element embedding
by: ZHANG, Xiaodan, et al.
Published: (2020) -
Learning to collocate Visual-Linguistic Neural Modules for image captioning
by: Yang, Xu, et al.
Published: (2023) -
Context-aware visual policy network for fine-grained image captioning
by: Zha, Zheng-Jun, et al.
Published: (2022) -
Deconfounded image captioning: a causal retrospect
by: Yang, Xu, et al.
Published: (2022)