Grounding referring expression in computer vision

This project studies the integration of language and vision in computer vision, focusing on Grounding Referring Expressions utilising the state-of-the-art GroundingDINO model. We address the topic of object identification and segmentation, emphasising zero-shot models’ ability to recognise items...

Full description

Saved in:

Bibliographic Details
Main Author:	Yuen, Shaun Chien Wee
Other Authors:	Hanwang Zhang
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Computer vision Grounding Artificial intelligence
Online Access:	https://hdl.handle.net/10356/174979
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-174979
record_format	dspace
spelling	sg-ntu-dr.10356-1749792024-04-19T15:46:39Z Grounding referring expression in computer vision Yuen, Shaun Chien Wee Hanwang Zhang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg Computer and Information Science Computer vision Grounding Artificial intelligence This project studies the integration of language and vision in computer vision, focusing on Grounding Referring Expressions utilising the state-of-the-art GroundingDINO model. We address the topic of object identification and segmentation, emphasising zero-shot models’ ability to recognise items outside of their training sets. GroundingDINO, an improvement on the DINO model, is essential to our study, as it has significant capabilities in open-set object detection and natural language processing. The project aims to create a Proof of Concept Demo Application demonstrating GroundingDINO’s practical uses in improving human-computer interactions. Our literature review looks into the evolution of computer vision models and the revolutionary characteristics of GroundingDINO and finds gaps in current research, especially in dynamic situations like real-time video analysis. This contributes to the field by highlighting the potential of GroundingDINO in various industries, from surveillance to autonomous systems, and addresses the need for improved language-based object detection in computer vision. Bachelor's degree 2024-04-19T02:11:03Z 2024-04-19T02:11:03Z 2024 Final Year Project (FYP) Yuen, S. C. W. (2024). Grounding referring expression in computer vision. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/174979 https://hdl.handle.net/10356/174979 en SCSE23-0212 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Computer vision Grounding Artificial intelligence
spellingShingle	Computer and Information Science Computer vision Grounding Artificial intelligence Yuen, Shaun Chien Wee Grounding referring expression in computer vision
description	This project studies the integration of language and vision in computer vision, focusing on Grounding Referring Expressions utilising the state-of-the-art GroundingDINO model. We address the topic of object identification and segmentation, emphasising zero-shot models’ ability to recognise items outside of their training sets. GroundingDINO, an improvement on the DINO model, is essential to our study, as it has significant capabilities in open-set object detection and natural language processing. The project aims to create a Proof of Concept Demo Application demonstrating GroundingDINO’s practical uses in improving human-computer interactions. Our literature review looks into the evolution of computer vision models and the revolutionary characteristics of GroundingDINO and finds gaps in current research, especially in dynamic situations like real-time video analysis. This contributes to the field by highlighting the potential of GroundingDINO in various industries, from surveillance to autonomous systems, and addresses the need for improved language-based object detection in computer vision.
author2	Hanwang Zhang
author_facet	Hanwang Zhang Yuen, Shaun Chien Wee
format	Final Year Project
author	Yuen, Shaun Chien Wee
author_sort	Yuen, Shaun Chien Wee
title	Grounding referring expression in computer vision
title_short	Grounding referring expression in computer vision
title_full	Grounding referring expression in computer vision
title_fullStr	Grounding referring expression in computer vision
title_full_unstemmed	Grounding referring expression in computer vision
title_sort	grounding referring expression in computer vision
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/174979
_version_	1800916117888172032

Grounding referring expression in computer vision

Similar Items