Learning to collocate Visual-Linguistic Neural Modules for image captioning

Humans tend to decompose a sentence into different parts like sth do sth at someplace and then fill each part with certain content. Inspired by this, we follow the principle of modular design to propose a novel image captioner: learning to Collocate Visual-Linguistic Neural Modules (CVLNM). Unlike t...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang, Xu, Zhang, Hanwang, Gao, Chongyang, Cai, Jianfei
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/170425
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English