Instance LSeg - exploring instance level information from visual language model

This final year project explores the potential of using large-scale pretrained visual language models in instance-level zero-shot computer vision tasks. Specifically, we propose Instance LSeg - a novel approach to extend the zero-shot semantic segmentation method LSeg to perform language guided...

Full description

Saved in:
Bibliographic Details
Main Author: Lin, Zixing
Other Authors: Lin Guosheng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171917
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-171917
record_format dspace
spelling sg-ntu-dr.10356-1719172023-11-17T15:37:51Z Instance LSeg - exploring instance level information from visual language model Lin, Zixing Lin Guosheng School of Computer Science and Engineering gslin@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision This final year project explores the potential of using large-scale pretrained visual language models in instance-level zero-shot computer vision tasks. Specifically, we propose Instance LSeg - a novel approach to extend the zero-shot semantic segmentation method LSeg to perform language guided instance segmentation and grounding of natural language expressions in images. To evaluate our method, we used three popular referring datasets, and we observe that our method achieves highly competitive results against published generalized visual grounding baselines Bachelor of Engineering (Computer Science) 2023-11-16T02:40:46Z 2023-11-16T02:40:46Z 2023 Final Year Project (FYP) Lin, Z. (2023). Instance LSeg - exploring instance level information from visual language model. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/171917 https://hdl.handle.net/10356/171917 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Lin, Zixing
Instance LSeg - exploring instance level information from visual language model
description This final year project explores the potential of using large-scale pretrained visual language models in instance-level zero-shot computer vision tasks. Specifically, we propose Instance LSeg - a novel approach to extend the zero-shot semantic segmentation method LSeg to perform language guided instance segmentation and grounding of natural language expressions in images. To evaluate our method, we used three popular referring datasets, and we observe that our method achieves highly competitive results against published generalized visual grounding baselines
author2 Lin Guosheng
author_facet Lin Guosheng
Lin, Zixing
format Final Year Project
author Lin, Zixing
author_sort Lin, Zixing
title Instance LSeg - exploring instance level information from visual language model
title_short Instance LSeg - exploring instance level information from visual language model
title_full Instance LSeg - exploring instance level information from visual language model
title_fullStr Instance LSeg - exploring instance level information from visual language model
title_full_unstemmed Instance LSeg - exploring instance level information from visual language model
title_sort instance lseg - exploring instance level information from visual language model
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/171917
_version_ 1783955593154789376