Deconfounded image captioning: a causal retrospect

Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yang, Xu, Zhang, Hanwang, Cai, Jianfei
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2022
Subjects:	Engineering::Computer science and engineering Image Captioning Causality
Online Access:	https://hdl.handle.net/10356/162629
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-162629
record_format	dspace
spelling	sg-ntu-dr.10356-1626292022-11-01T06:51:01Z Deconfounded image captioning: a causal retrospect Yang, Xu Zhang, Hanwang Cai, Jianfei School of Computer Science and Engineering Engineering::Computer science and engineering Image Captioning Causality Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image Captioning (DIC), to find out the answer of this question, then retrospect modern neural image captioners, and finally propose a DIC framework: DICv1.0 to alleviate the negative effects brought by dataset bias. DIC is based on causal inference, whose two principles: the backdoor and front-door adjustments, help us review previous studies and design new effective models. In particular, we showcase that DICv1.0 can strengthen two prevailing captioning models and can achieve a single-model 131.1 CIDEr-D and 128.4 c40 CIDEr-D on Karpathy split and online split of the challenging MS COCO dataset, respectively. Interestingly, DICv1.0 is a natural derivation from our causal retrospect, which opens promising directions for image captioning. 2022-11-01T06:51:01Z 2022-11-01T06:51:01Z 2021 Journal Article Yang, X., Zhang, H. & Cai, J. (2021). Deconfounded image captioning: a causal retrospect. IEEE Transactions On Pattern Analysis and Machine Intelligence, 3121705-. https://dx.doi.org/10.1109/TPAMI.2021.3121705 0162-8828 https://hdl.handle.net/10356/162629 10.1109/TPAMI.2021.3121705 34673483 2-s2.0-85123727842 3121705 en IEEE Transactions on Pattern Analysis and Machine Intelligence © 2021 IEEE. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Image Captioning Causality
spellingShingle	Engineering::Computer science and engineering Image Captioning Causality Yang, Xu Zhang, Hanwang Cai, Jianfei Deconfounded image captioning: a causal retrospect
description	Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image Captioning (DIC), to find out the answer of this question, then retrospect modern neural image captioners, and finally propose a DIC framework: DICv1.0 to alleviate the negative effects brought by dataset bias. DIC is based on causal inference, whose two principles: the backdoor and front-door adjustments, help us review previous studies and design new effective models. In particular, we showcase that DICv1.0 can strengthen two prevailing captioning models and can achieve a single-model 131.1 CIDEr-D and 128.4 c40 CIDEr-D on Karpathy split and online split of the challenging MS COCO dataset, respectively. Interestingly, DICv1.0 is a natural derivation from our causal retrospect, which opens promising directions for image captioning.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Yang, Xu Zhang, Hanwang Cai, Jianfei
format	Article
author	Yang, Xu Zhang, Hanwang Cai, Jianfei
author_sort	Yang, Xu
title	Deconfounded image captioning: a causal retrospect
title_short	Deconfounded image captioning: a causal retrospect
title_full	Deconfounded image captioning: a causal retrospect
title_fullStr	Deconfounded image captioning: a causal retrospect
title_full_unstemmed	Deconfounded image captioning: a causal retrospect
title_sort	deconfounded image captioning: a causal retrospect
publishDate	2022
url	https://hdl.handle.net/10356/162629
_version_	1749179139940679680

Deconfounded image captioning: a causal retrospect

Similar Items