Stack-VS : stacked visual-semantic attention for image caption generation

Stack-VS : stacked visual-semantic attention for image caption generation

Recently, automatic image caption generation has been an important focus of the work on multimodal translation task. Existing approaches can be roughly categorized into two classes, top-down and bottom-up, the former transfers the image information (called as visual-level feature) directly into a ca...

Full description

Saved in:

Bibliographic Details
Main Authors:	Cheng, Ling, Wei, Wei, Mao, Xianling, Liu, Yong, Miao, Chunyan
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2021
Subjects:	Engineering::Computer science and engineering Image Captioning Recurrent Neural Network
Online Access:	https://hdl.handle.net/10356/148460
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

PERSONALIZED VISUAL INFORMATION CAPTIONING
by: WU SHUANG
Published: (2023)

Image captioning via semantic element embedding
by: ZHANG, Xiaodan, et al.
Published: (2020)

Learning to collocate Visual-Linguistic Neural Modules for image captioning
by: Yang, Xu, et al.
Published: (2023)

Context-aware visual policy network for fine-grained image captioning
by: Zha, Zheng-Jun, et al.
Published: (2022)

Deconfounded image captioning: a causal retrospect
by: Yang, Xu, et al.
Published: (2022)

Learning transferable perturbations for image captioning
by: WU, Hanjie, et al.
Published: (2022)

Semantic-filtered Soft-Split-Aware video captioning with audio-augmented feature
by: Xu, Yuecong, et al.
Published: (2021)

Cross-modal graph with meta concepts for video captioning
by: Wang, Hao, et al.
Published: (2022)

Interactive change-aware transformer network for remote sensing image change captioning
by: Cai, Chen, et al.
Published: (2024)

A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
by: Liu, A.-A., et al.
Published: (2021)

Keyword-driven image captioning via Context-dependent Bilateral LSTM
by: ZHANG, Xiaodan, et al.
Published: (2017)

Dynamic captioning: Video accessibility enhancement for hearing impairment
by: Hong, R., et al.
Published: (2013)

More is better : precise and detailed image captioning using online positive recall and missing concepts mining
by: Zhang, Mingxing, et al.
Published: (2020)

CgT-GAN: CLIP-guided text GAN for image captioning
by: YU, Jiarui, et al.
Published: (2023)

ATTENTIVE RECURRENT NEURAL NETWORKS
by: LI MINGMING
Published: (2017)

Cross-modal graph with meta concepts for video captioning
by: WANG, Hao, et al.
Published: (2022)

AmpSum: adaptive multiple-product summarization towards improving recommendation captions
by: TRUONG, Quoc Tuan, et al.
Published: (2022)

Learning generalized video memory for automatic video captioning
by: CHANG, Poo-Hee, et al.
Published: (2018)

CARF-net : CNN attention and RNN fusion network for video-based person reidentification
by: Prasad, Dilip Kumar, et al.
Published: (2019)

Rights that can’t be heard: Addressing the need for extending closed caption law in social media platforms for COVID-19 pandemic related coverage
by: Daos, Bernadette De Vera
Published: (2020)

An intelligent system for taxi service : analysis, prediction and visualization
by: Lu, Yu, et al.
Published: (2019)

Stacked attention networks for referring expressions comprehension
by: Li, Yugang, et al.
Published: (2021)

Sag source location and type recognition via attention-based independently recurrent neural network
by: Deng, Yaping, et al.
Published: (2021)

Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis
by: Ramanathan, S., et al.
Published: (2013)

Better pay attention whilst fuzzing
by: ZHU, Shunkai, et al.
Published: (2023)

Video accessibility enhancement for hearing-impaired users
by: Hong, R., et al.
Published: (2013)

Understanding the role of images on stack overflow
by: WANG, Dong, et al.
Published: (2023)

Visualizing relation between tags in StackOverflow
by: Teong, Ke Ming
Published: (2016)

NEURAL NETWORK MODELLING OF POLYMER ELECTROLYE FUEL CELL STACK AND SYSTEM
by: YONG RUI YUAN
Published: (2021)

Who You Are Decides How You Tell
by: WU SHUANG, et al.
Published: (2020)

Task-generic semantic convolutional neural network for web text-aided image classification
by: Wang, Dongzhe, et al.
Published: (2021)

API linking in stack overflow
by: Ang, Wei Loon
Published: (2017)

Four-high towers vs stacked satellites
by: Fuchs, Boris.
Published: (2008)

Temporal Spiking Recurrent Neural Network for Action Recognition
by: Wang, W., et al.
Published: (2022)

Cross-modal recipe retrieval with stacked attention model
by: CHEN, Jing-Jing, et al.
Published: (2018)

Training issues and learning algorithms for feedforward and recurrent neural networks
by: TEOH EU JIN
Published: (2010)

Study on C. elegans behaviors using recurent neural network model
by: Xu, J.-X., et al.
Published: (2014)

Contextualized graph attention network for recommendation with item knowledge graph
by: Liu, Yong, et al.
Published: (2022)

Skeleton-based action recognition using spatio-temporal lstm network with trust gates
by: Liu, Jun, et al.
Published: (2020)

Eccentric actuator driven by stacked electrohydrodynamic pumps
by: Mao, Ze-bing, et al.
Published: (2022)