Contrastive knowledge transfer from CLIP for open vocabulary object detection

Object detection has made remarkable progress in recent years. While in real-world scenarios, a model is expected to generalize to novel objects that it never explicitly trained on. Though pre-trained vision language model has shown powerful results in zero-shot classification task, adapting it to d...

Full description

Saved in:

Bibliographic Details
Main Author:	Zhang, Chuhan
Other Authors:	Hanwang Zhang
Format:	Thesis-Master by Research
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Online Access:	https://hdl.handle.net/10356/172024
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-172024
record_format	dspace
spelling	sg-ntu-dr.10356-1720242023-12-01T01:52:37Z Contrastive knowledge transfer from CLIP for open vocabulary object detection Zhang, Chuhan Hanwang Zhang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Object detection has made remarkable progress in recent years. While in real-world scenarios, a model is expected to generalize to novel objects that it never explicitly trained on. Though pre-trained vision language model has shown powerful results in zero-shot classification task, adapting it to detection task is non-trivial due to the detection includes region-level reasoning as well as non-semantic localization. In this dissertation, a method built on detr-style architecture and contrastive dis- tillation has been proposed. It utilizes the CLIP model to provide semantic-rich features as priors for querying novel objects. Besides, the model is trained to align with CLIP in a latent space via contrastive loss, enabling it to distinguish unseen classes. The effectiveness of the proposed method is supported by the experimental results with 65.3 novel AR and 23.4 novel mAP on MSCOCO dataset. Its variants out- performs its counter part by 3.5 mAP and 3.1 mAP respectively. The proposed contrastive distillation loss could also be integrated with other framework and achieves the best performance. The significance of different modules is revealed through ablation study and visualization study. The qualitative analysis demonstrates the potential of the proposed method as an effective on-the-fly detector. In final part, a discussion section analyzes the critical factors that contribute to open vocabulary object detection. It provides a unified perspective on reconstruction loss and contrastive loss, offering an interpretation of feature transfer within the context of open vocabulary scenarios. Master of Engineering 2023-11-20T01:48:04Z 2023-11-20T01:48:04Z 2023 Thesis-Master by Research Zhang, C. (2023). Contrastive knowledge transfer from CLIP for open vocabulary object detection. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/172024 https://hdl.handle.net/10356/172024 10.32657/10356/172024 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Zhang, Chuhan Contrastive knowledge transfer from CLIP for open vocabulary object detection
description	Object detection has made remarkable progress in recent years. While in real-world scenarios, a model is expected to generalize to novel objects that it never explicitly trained on. Though pre-trained vision language model has shown powerful results in zero-shot classification task, adapting it to detection task is non-trivial due to the detection includes region-level reasoning as well as non-semantic localization. In this dissertation, a method built on detr-style architecture and contrastive dis- tillation has been proposed. It utilizes the CLIP model to provide semantic-rich features as priors for querying novel objects. Besides, the model is trained to align with CLIP in a latent space via contrastive loss, enabling it to distinguish unseen classes. The effectiveness of the proposed method is supported by the experimental results with 65.3 novel AR and 23.4 novel mAP on MSCOCO dataset. Its variants out- performs its counter part by 3.5 mAP and 3.1 mAP respectively. The proposed contrastive distillation loss could also be integrated with other framework and achieves the best performance. The significance of different modules is revealed through ablation study and visualization study. The qualitative analysis demonstrates the potential of the proposed method as an effective on-the-fly detector. In final part, a discussion section analyzes the critical factors that contribute to open vocabulary object detection. It provides a unified perspective on reconstruction loss and contrastive loss, offering an interpretation of feature transfer within the context of open vocabulary scenarios.
author2	Hanwang Zhang
author_facet	Hanwang Zhang Zhang, Chuhan
format	Thesis-Master by Research
author	Zhang, Chuhan
author_sort	Zhang, Chuhan
title	Contrastive knowledge transfer from CLIP for open vocabulary object detection
title_short	Contrastive knowledge transfer from CLIP for open vocabulary object detection
title_full	Contrastive knowledge transfer from CLIP for open vocabulary object detection
title_fullStr	Contrastive knowledge transfer from CLIP for open vocabulary object detection
title_full_unstemmed	Contrastive knowledge transfer from CLIP for open vocabulary object detection
title_sort	contrastive knowledge transfer from clip for open vocabulary object detection
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/172024
_version_	1784855536763666432

Contrastive knowledge transfer from CLIP for open vocabulary object detection

Similar Items