Keyword-driven image captioning via Context-dependent Bilateral LSTM

Image captioning has recently received much attention. Existing approaches, however, are limited to describing images with simple contextual information, which typically generate one sentence to describe each image with only a single contextual emphasis. In this paper, we address this limitation fro...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG, Xiaodan, HE, Shengfeng, SONG, Xinhang, WEI, Pengxu, JIANG, Shuqiang, YE, Qixiang, JIAO, Jianbin, LAU, Rynson W. H.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8497
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9500
record_format dspace
spelling sg-smu-ink.sis_research-95002024-01-04T04:18:03Z Keyword-driven image captioning via Context-dependent Bilateral LSTM ZHANG, Xiaodan HE, Shengfeng SONG, Xinhang WEI, Pengxu JIANG, Shuqiang YE, Qixiang JIAO, Jianbin LAU, Rynson W. H. Image captioning has recently received much attention. Existing approaches, however, are limited to describing images with simple contextual information, which typically generate one sentence to describe each image with only a single contextual emphasis. In this paper, we address this limitation from a user perspective with a novel approach. Given some keywords as additional inputs, the proposed method would generate various descriptions according to the provided guidance. Hence, descriptions with different focuses can be generated for the same image. Our method is based on a new Context-dependent Bilateral Long Short-Term Memory (CDB-LSTM) model to predict a keyword-driven sentence by considering the word dependence. The word dependence is explored externally with a bilateral pipeline, and internally with a unified and joint training process. Experiments on the MS COCO dataset demonstrate that the proposed approach not only significantly outperforms the baseline method but also shows good adaptation and consistency with various keywords. 2017-07-14T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/8497 info:doi/10.1109/icme.2017.8019525 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Image captioning Keyword-driven L-STM Artificial Intelligence and Robotics
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Image captioning
Keyword-driven
L-STM
Artificial Intelligence and Robotics
spellingShingle Image captioning
Keyword-driven
L-STM
Artificial Intelligence and Robotics
ZHANG, Xiaodan
HE, Shengfeng
SONG, Xinhang
WEI, Pengxu
JIANG, Shuqiang
YE, Qixiang
JIAO, Jianbin
LAU, Rynson W. H.
Keyword-driven image captioning via Context-dependent Bilateral LSTM
description Image captioning has recently received much attention. Existing approaches, however, are limited to describing images with simple contextual information, which typically generate one sentence to describe each image with only a single contextual emphasis. In this paper, we address this limitation from a user perspective with a novel approach. Given some keywords as additional inputs, the proposed method would generate various descriptions according to the provided guidance. Hence, descriptions with different focuses can be generated for the same image. Our method is based on a new Context-dependent Bilateral Long Short-Term Memory (CDB-LSTM) model to predict a keyword-driven sentence by considering the word dependence. The word dependence is explored externally with a bilateral pipeline, and internally with a unified and joint training process. Experiments on the MS COCO dataset demonstrate that the proposed approach not only significantly outperforms the baseline method but also shows good adaptation and consistency with various keywords.
format text
author ZHANG, Xiaodan
HE, Shengfeng
SONG, Xinhang
WEI, Pengxu
JIANG, Shuqiang
YE, Qixiang
JIAO, Jianbin
LAU, Rynson W. H.
author_facet ZHANG, Xiaodan
HE, Shengfeng
SONG, Xinhang
WEI, Pengxu
JIANG, Shuqiang
YE, Qixiang
JIAO, Jianbin
LAU, Rynson W. H.
author_sort ZHANG, Xiaodan
title Keyword-driven image captioning via Context-dependent Bilateral LSTM
title_short Keyword-driven image captioning via Context-dependent Bilateral LSTM
title_full Keyword-driven image captioning via Context-dependent Bilateral LSTM
title_fullStr Keyword-driven image captioning via Context-dependent Bilateral LSTM
title_full_unstemmed Keyword-driven image captioning via Context-dependent Bilateral LSTM
title_sort keyword-driven image captioning via context-dependent bilateral lstm
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/8497
_version_ 1787590780556148736