Deconfounded image captioning: a causal retrospect
Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/162629 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-162629 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1626292022-11-01T06:51:01Z Deconfounded image captioning: a causal retrospect Yang, Xu Zhang, Hanwang Cai, Jianfei School of Computer Science and Engineering Engineering::Computer science and engineering Image Captioning Causality Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image Captioning (DIC), to find out the answer of this question, then retrospect modern neural image captioners, and finally propose a DIC framework: DICv1.0 to alleviate the negative effects brought by dataset bias. DIC is based on causal inference, whose two principles: the backdoor and front-door adjustments, help us review previous studies and design new effective models. In particular, we showcase that DICv1.0 can strengthen two prevailing captioning models and can achieve a single-model 131.1 CIDEr-D and 128.4 c40 CIDEr-D on Karpathy split and online split of the challenging MS COCO dataset, respectively. Interestingly, DICv1.0 is a natural derivation from our causal retrospect, which opens promising directions for image captioning. 2022-11-01T06:51:01Z 2022-11-01T06:51:01Z 2021 Journal Article Yang, X., Zhang, H. & Cai, J. (2021). Deconfounded image captioning: a causal retrospect. IEEE Transactions On Pattern Analysis and Machine Intelligence, 3121705-. https://dx.doi.org/10.1109/TPAMI.2021.3121705 0162-8828 https://hdl.handle.net/10356/162629 10.1109/TPAMI.2021.3121705 34673483 2-s2.0-85123727842 3121705 en IEEE Transactions on Pattern Analysis and Machine Intelligence © 2021 IEEE. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Image Captioning Causality |
spellingShingle |
Engineering::Computer science and engineering Image Captioning Causality Yang, Xu Zhang, Hanwang Cai, Jianfei Deconfounded image captioning: a causal retrospect |
description |
Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image Captioning (DIC), to find out the answer of this question, then retrospect modern neural image captioners, and finally propose a DIC framework: DICv1.0 to alleviate the negative effects brought by dataset bias. DIC is based on causal inference, whose two principles: the backdoor and front-door adjustments, help us review previous studies and design new effective models. In particular, we showcase that DICv1.0 can strengthen two prevailing captioning models and can achieve a single-model 131.1 CIDEr-D and 128.4 c40 CIDEr-D on Karpathy split and online split of the challenging MS COCO dataset, respectively. Interestingly, DICv1.0 is a natural derivation from our causal retrospect, which opens promising directions for image captioning. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Yang, Xu Zhang, Hanwang Cai, Jianfei |
format |
Article |
author |
Yang, Xu Zhang, Hanwang Cai, Jianfei |
author_sort |
Yang, Xu |
title |
Deconfounded image captioning: a causal retrospect |
title_short |
Deconfounded image captioning: a causal retrospect |
title_full |
Deconfounded image captioning: a causal retrospect |
title_fullStr |
Deconfounded image captioning: a causal retrospect |
title_full_unstemmed |
Deconfounded image captioning: a causal retrospect |
title_sort |
deconfounded image captioning: a causal retrospect |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/162629 |
_version_ |
1749179139940679680 |