Visual relationship detection

Current scene graph generation (SGG) models struggle to achieve accurate and effective visual relationship detections between objects in images due to the existence of severely biased training datasets. For instance, biased SGG models often predict trivial and uninformative relationships such as...

全面介紹

Saved in:
書目詳細資料
主要作者: Lee, Xavier Eugene
其他作者: Hanwang Zhang
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2024
主題:
在線閱讀:https://hdl.handle.net/10356/175285
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
實物特徵
總結:Current scene graph generation (SGG) models struggle to achieve accurate and effective visual relationship detections between objects in images due to the existence of severely biased training datasets. For instance, biased SGG models often predict trivial and uninformative relationships such as “on” over more descriptive relationships like “running on” or “by” instead of “walking by”. Debiasing SGG, however, presents its own set of challenges as well due to the existence of long-tailed biases, bounded rationality, and language or reporting biases present during training. This paper presents a SGG framework with the novel Total Direct Effect (TDE) analysis within causal inference. The proposed framework is compared against a conventional causal effect framework: SGG framework with Total Effect (TE) analysis. While both frameworks construct factual causal graphs from traditional biased training, the TDE SGG models further apply counterfactual causality on the trained graphs to remove bad biases. After which, either TE or TDE is used to calculate and predict the predicates for their respective frameworks. In this paper, thorough analysis and evaluation have been conducted on the proposed SGG framework, concluding that the framework outperforms conventional SGG methods in object and relationship prediction accuracies across all the relationship retrieval tasks tested. As such, this research aims to contribute to the existing field of visual relationship detection with the proposed framework.