OW-Mamba: Mamba for open world object detection
Object detection is a fundamental task in computer vision, and recently, a more challenging variant known as open-world object detection has gained attention. This task involves not only identifying novel, unknown objects but also incrementally learning to classify...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181673 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Object detection is a fundamental task in computer vision, and recently, a more challenging variant known as open-world object detection has gained attention. This task involves not only identifying novel, unknown objects but also incrementally learning to classify them as labels become available. Two notable approaches, Open World Detection Transformer (OWDETR) and Localization and Identification Cascade Detection Transformer (CAT), have been proposed to address this challenge. However, these methods are prone to generating false unknown objects and are computationally expensive, especially with high-resolution images. Additionally, there is significant room for improvement in detecting novel objects.
To overcome these limitations, we propose OW-Mamba, an enhanced approach based on CAT. Specifically, we replace the ResNet-50 backbone in CAT with VMamba-T and introduce a dual-stream decoder, which improves both localization and classification. Furthermore, we refine the pseudo-labeling process to reduce the generation of false positives. Extensive experiments show that OW- Mamba outperforms CAT in Tasks 1, 3, and 4, while also significantly reducing the time and GPU memory required. |
---|