Visual relationship detection
Current scene graph generation (SGG) models struggle to achieve accurate and effective visual relationship detections between objects in images due to the existence of severely biased training datasets. For instance, biased SGG models often predict trivial and uninformative relationships such as...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175285 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Current scene graph generation (SGG) models struggle to achieve accurate and effective
visual relationship detections between objects in images due to the existence of severely
biased training datasets. For instance, biased SGG models often predict trivial and
uninformative relationships such as “on” over more descriptive relationships like “running
on” or “by” instead of “walking by”. Debiasing SGG, however, presents its own set of
challenges as well due to the existence of long-tailed biases, bounded rationality, and
language or reporting biases present during training.
This paper presents a SGG framework with the novel Total Direct Effect (TDE) analysis
within causal inference. The proposed framework is compared against a conventional causal
effect framework: SGG framework with Total Effect (TE) analysis. While both frameworks
construct factual causal graphs from traditional biased training, the TDE SGG models further
apply counterfactual causality on the trained graphs to remove bad biases. After which, either
TE or TDE is used to calculate and predict the predicates for their respective frameworks.
In this paper, thorough analysis and evaluation have been conducted on the proposed SGG
framework, concluding that the framework outperforms conventional SGG methods in object
and relationship prediction accuracies across all the relationship retrieval tasks tested. As
such, this research aims to contribute to the existing field of visual relationship detection with
the proposed framework. |
---|