Scene graph extraction from images
An image contains a lot of information, and that information can be used in high-level complex systems for operations such as Computer Vision tasks. Most Computer Vision tasks, such as Image Classification and Object Detection, only require outputting an image-level prediction or the localization of...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/156443 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | An image contains a lot of information, and that information can be used in high-level complex systems for operations such as Computer Vision tasks. Most Computer Vision tasks, such as Image Classification and Object Detection, only require outputting an image-level prediction or the localization of objects in the image. However, it is still not sufficient for a comprehensive interpretation of all the information in an image. To deliver all the information within an image, a generated Scene Graph can be used. A Scene Graph is a structured representation of a scene that clearly express the objects and their attributes in the form of nodes, and relationships between objects in the form of edges, so that a graph structure can be built. This project aims to understand Scene Graph Generation, explore several classic methodologies by evaluating and comparing the correctness of predicted scene graph models, and find the key factors that affect the correctness of scene graphs. Many insights had been discovered in this project, for example, prior knowledge (which can be interpreted as common sense), can greatly affect the performance of Scene Graph Generation. Additionally, it was observed that models with a better backbone generated a more accurate Scene Graph. Beyond the exploration of methodologies, a software was developed to process photos captured from a connected webcam into a Scene Graph. |
---|