More is better : precise and detailed image captioning using online positive recall and missing concepts mining

Recently, a great progress in automatic image captioning has been achieved by using semantic concepts detected from the image. However, we argue that existing concepts-to-caption framework, in which the concept detector is trained using the image-caption pairs to minimize the vocabulary discrepancy,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhang, Mingxing, Yang, Yang, Zhang, Hanwang, Ji, Yanli, Shen, Heng Tao, Chua, Tat-Seng
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2020
Subjects:	Engineering::Computer science and engineering Precise and Detailed Image Captioning Semantic Concepts
Online Access:	https://hdl.handle.net/10356/142314
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-142314
record_format	dspace
spelling	sg-ntu-dr.10356-1423142020-06-19T02:54:56Z More is better : precise and detailed image captioning using online positive recall and missing concepts mining Zhang, Mingxing Yang, Yang Zhang, Hanwang Ji, Yanli Shen, Heng Tao Chua, Tat-Seng School of Computer Science and Engineering Engineering::Computer science and engineering Precise and Detailed Image Captioning Semantic Concepts Recently, a great progress in automatic image captioning has been achieved by using semantic concepts detected from the image. However, we argue that existing concepts-to-caption framework, in which the concept detector is trained using the image-caption pairs to minimize the vocabulary discrepancy, suffers from the deficiency of insufficient concepts. The reasons are two-fold: 1) the extreme imbalance between the number of occurrence positive and negative samples of the concept and 2) the incomplete labeling in training captions caused by the biased annotation and usage of synonyms. In this paper, we propose a method, termed online positive recall and missing concepts mining, to overcome those problems. Our method adaptively re-weights the loss of different samples according to their predictions for online positive recall and uses a two-stage optimization strategy for missing concepts mining. In this way, more semantic concepts can be detected and a high accuracy will be expected. On the caption generation stage, we explore an element-wise selection process to automatically choose the most suitable concepts at each time step. Thus, our method can generate more precise and detailed caption to describe the image. We conduct extensive experiments on the MSCOCO image captioning data set and the MSCOCO online test server, which shows that our method achieves superior image captioning performance compared with other competitive methods. 2020-06-19T02:54:56Z 2020-06-19T02:54:56Z 2018 Journal Article Zhang, M., Yang, Y., Zhang, H., Ji, Y., Shen, H. T., & Chua, T.-S. (2019). More is better : precise and detailed image captioning using online positive recall and missing concepts mining. IEEE Transactions on Image Processing, 28(1), 32-44. doi:10.1109/TIP.2018.2855415 1057-7149 https://hdl.handle.net/10356/142314 10.1109/TIP.2018.2855415 30010565 2-s2.0-85049964023 1 28 32 44 en IEEE Transactions on Image Processing © 2018 IEEE. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Precise and Detailed Image Captioning Semantic Concepts
spellingShingle	Engineering::Computer science and engineering Precise and Detailed Image Captioning Semantic Concepts Zhang, Mingxing Yang, Yang Zhang, Hanwang Ji, Yanli Shen, Heng Tao Chua, Tat-Seng More is better : precise and detailed image captioning using online positive recall and missing concepts mining
description	Recently, a great progress in automatic image captioning has been achieved by using semantic concepts detected from the image. However, we argue that existing concepts-to-caption framework, in which the concept detector is trained using the image-caption pairs to minimize the vocabulary discrepancy, suffers from the deficiency of insufficient concepts. The reasons are two-fold: 1) the extreme imbalance between the number of occurrence positive and negative samples of the concept and 2) the incomplete labeling in training captions caused by the biased annotation and usage of synonyms. In this paper, we propose a method, termed online positive recall and missing concepts mining, to overcome those problems. Our method adaptively re-weights the loss of different samples according to their predictions for online positive recall and uses a two-stage optimization strategy for missing concepts mining. In this way, more semantic concepts can be detected and a high accuracy will be expected. On the caption generation stage, we explore an element-wise selection process to automatically choose the most suitable concepts at each time step. Thus, our method can generate more precise and detailed caption to describe the image. We conduct extensive experiments on the MSCOCO image captioning data set and the MSCOCO online test server, which shows that our method achieves superior image captioning performance compared with other competitive methods.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Zhang, Mingxing Yang, Yang Zhang, Hanwang Ji, Yanli Shen, Heng Tao Chua, Tat-Seng
format	Article
author	Zhang, Mingxing Yang, Yang Zhang, Hanwang Ji, Yanli Shen, Heng Tao Chua, Tat-Seng
author_sort	Zhang, Mingxing
title	More is better : precise and detailed image captioning using online positive recall and missing concepts mining
title_short	More is better : precise and detailed image captioning using online positive recall and missing concepts mining
title_full	More is better : precise and detailed image captioning using online positive recall and missing concepts mining
title_fullStr	More is better : precise and detailed image captioning using online positive recall and missing concepts mining
title_full_unstemmed	More is better : precise and detailed image captioning using online positive recall and missing concepts mining
title_sort	more is better : precise and detailed image captioning using online positive recall and missing concepts mining
publishDate	2020
url	https://hdl.handle.net/10356/142314
_version_	1681058948073914368

More is better : precise and detailed image captioning using online positive recall and missing concepts mining

Similar Items