Debiasing visual question and answering with answer preference
Visual Question Answering (VQA) requires models to generate a reasonable answer with given an image and corresponding question. It requires strong reasoning capabilities for two kinds of input features, namely image and question. However, most state-of-the-art results heavily rely on superficial cor...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/137906 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-137906 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1379062020-04-18T00:20:09Z Debiasing visual question and answering with answer preference Zhang, Xinye Zhang Hanwang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Visual Question Answering (VQA) requires models to generate a reasonable answer with given an image and corresponding question. It requires strong reasoning capabilities for two kinds of input features, namely image and question. However, most state-of-the-art results heavily rely on superficial correlations in the dataset, given it delicately balancing the dataset is almost impossible. In this paper, we proposed a simple method by using answer preference to reduce the impact of data bias and improve the robustness of VQA models against prior changes. Two pipelines of using answer preference, at the training stage as well as the inference stage, are experimented and achieved genuine improvement on the VQA-CP dataset. VQA-CP dataset is designed to test the performance of the VQA model under domain shift. Bachelor of Engineering (Computer Science) 2020-04-18T00:20:09Z 2020-04-18T00:20:09Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137906 en SCSE19-0193 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Zhang, Xinye Debiasing visual question and answering with answer preference |
description |
Visual Question Answering (VQA) requires models to generate a reasonable answer with given an image and corresponding question. It requires strong reasoning capabilities for two kinds of input features, namely image and question. However, most state-of-the-art results heavily rely on superficial correlations in the dataset, given it delicately balancing the dataset is almost impossible. In this paper, we proposed a simple method by using answer preference to reduce the impact of data bias and improve the robustness of VQA models against prior changes. Two pipelines of using answer preference, at the training stage as well as the inference stage, are experimented and achieved genuine improvement on the VQA-CP dataset. VQA-CP dataset is designed to test the performance of the VQA model under domain shift. |
author2 |
Zhang Hanwang |
author_facet |
Zhang Hanwang Zhang, Xinye |
format |
Final Year Project |
author |
Zhang, Xinye |
author_sort |
Zhang, Xinye |
title |
Debiasing visual question and answering with answer preference |
title_short |
Debiasing visual question and answering with answer preference |
title_full |
Debiasing visual question and answering with answer preference |
title_fullStr |
Debiasing visual question and answering with answer preference |
title_full_unstemmed |
Debiasing visual question and answering with answer preference |
title_sort |
debiasing visual question and answering with answer preference |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/137906 |
_version_ |
1681059313724948480 |