OW-Mamba: Mamba for open world object detection

Object detection is a fundamental task in computer vision, and recently, a more challenging variant known as open-world object detection has gained attention. This task involves not only identifying novel, unknown objects but also incrementally learning to classify...

Full description

Saved in:
Bibliographic Details
Main Author: Sun, Heyuan
Other Authors: Yap Kim Hui
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181673
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Object detection is a fundamental task in computer vision, and recently, a more challenging variant known as open-world object detection has gained attention. This task involves not only identifying novel, unknown objects but also incrementally learning to classify them as labels become available. Two notable approaches, Open World Detection Transformer (OWDETR) and Localization and Identification Cascade Detection Transformer (CAT), have been proposed to address this challenge. However, these methods are prone to generating false unknown objects and are computationally expensive, especially with high-resolution images. Additionally, there is significant room for improvement in detecting novel objects. To overcome these limitations, we propose OW-Mamba, an enhanced approach based on CAT. Specifically, we replace the ResNet-50 backbone in CAT with VMamba-T and introduce a dual-stream decoder, which improves both localization and classification. Furthermore, we refine the pseudo-labeling process to reduce the generation of false positives. Extensive experiments show that OW- Mamba outperforms CAT in Tasks 1, 3, and 4, while also significantly reducing the time and GPU memory required.