Deconfounded image captioning: a causal retrospect

Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang, Xu, Zhang, Hanwang, Cai, Jianfei
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/162629
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-162629
record_format dspace
spelling sg-ntu-dr.10356-1626292022-11-01T06:51:01Z Deconfounded image captioning: a causal retrospect Yang, Xu Zhang, Hanwang Cai, Jianfei School of Computer Science and Engineering Engineering::Computer science and engineering Image Captioning Causality Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image Captioning (DIC), to find out the answer of this question, then retrospect modern neural image captioners, and finally propose a DIC framework: DICv1.0 to alleviate the negative effects brought by dataset bias. DIC is based on causal inference, whose two principles: the backdoor and front-door adjustments, help us review previous studies and design new effective models. In particular, we showcase that DICv1.0 can strengthen two prevailing captioning models and can achieve a single-model 131.1 CIDEr-D and 128.4 c40 CIDEr-D on Karpathy split and online split of the challenging MS COCO dataset, respectively. Interestingly, DICv1.0 is a natural derivation from our causal retrospect, which opens promising directions for image captioning. 2022-11-01T06:51:01Z 2022-11-01T06:51:01Z 2021 Journal Article Yang, X., Zhang, H. & Cai, J. (2021). Deconfounded image captioning: a causal retrospect. IEEE Transactions On Pattern Analysis and Machine Intelligence, 3121705-. https://dx.doi.org/10.1109/TPAMI.2021.3121705 0162-8828 https://hdl.handle.net/10356/162629 10.1109/TPAMI.2021.3121705 34673483 2-s2.0-85123727842 3121705 en IEEE Transactions on Pattern Analysis and Machine Intelligence © 2021 IEEE. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Image Captioning
Causality
spellingShingle Engineering::Computer science and engineering
Image Captioning
Causality
Yang, Xu
Zhang, Hanwang
Cai, Jianfei
Deconfounded image captioning: a causal retrospect
description Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image Captioning (DIC), to find out the answer of this question, then retrospect modern neural image captioners, and finally propose a DIC framework: DICv1.0 to alleviate the negative effects brought by dataset bias. DIC is based on causal inference, whose two principles: the backdoor and front-door adjustments, help us review previous studies and design new effective models. In particular, we showcase that DICv1.0 can strengthen two prevailing captioning models and can achieve a single-model 131.1 CIDEr-D and 128.4 c40 CIDEr-D on Karpathy split and online split of the challenging MS COCO dataset, respectively. Interestingly, DICv1.0 is a natural derivation from our causal retrospect, which opens promising directions for image captioning.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Yang, Xu
Zhang, Hanwang
Cai, Jianfei
format Article
author Yang, Xu
Zhang, Hanwang
Cai, Jianfei
author_sort Yang, Xu
title Deconfounded image captioning: a causal retrospect
title_short Deconfounded image captioning: a causal retrospect
title_full Deconfounded image captioning: a causal retrospect
title_fullStr Deconfounded image captioning: a causal retrospect
title_full_unstemmed Deconfounded image captioning: a causal retrospect
title_sort deconfounded image captioning: a causal retrospect
publishDate 2022
url https://hdl.handle.net/10356/162629
_version_ 1749179139940679680