Object detection with deep neural networks under constrained scenarios
Object detection, which aims to recognize and locate objects within images using bounding boxes, is one of the most fundamental tasks in computer vision. Object detection forms the basis for many other computer vision tasks and has extensive use cases, such as autonomous driving, surveillance, robot...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/164687 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-164687 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Zhang, Gongjie Object detection with deep neural networks under constrained scenarios |
description |
Object detection, which aims to recognize and locate objects within images using bounding boxes, is one of the most fundamental tasks in computer vision. Object detection forms the basis for many other computer vision tasks and has extensive use cases, such as autonomous driving, surveillance, robotic vision, etc. In the past ten years, object detection has made unprecedented progress with the development of deep neural networks. Compared with prior arts that adopt handcrafted features, modern object detectors benefit from the strong feature representations produced by deep neural networks, and have achieved strong performance on many challenging generic object detection benchmarks, such as MSCOCO and OpenImages.
However, deep-neural-network-based object detectors are still far from perfect, still facing many challenges under various constrained scenarios. First, modern object detectors heavily rely on visual clues such as texture details, contours, and contrast with the background. However, in some scenarios (e.g., adverse weather or aerial object detection), these features are largely degraded or missing, adding substantial difficulty to object detection. Second, deep-neural-network-based object detectors usually require long training iterations, which are time-consuming and expensive, or even unaffordable to many researchers or companies. Third, as modern object detectors are mostly based on deep neural networks, they require huge amounts of training samples to learn a visual concept. However, such large-scale and annotated datasets are not always available due to expensive human labeling costs or difficulty in data acquisition. Fourth, when deploying modern detectors on edge devices with limited computational capacity, their complexity can be a bottleneck due to run-time requirements.
This thesis focuses on advancing object detection in several constrained scenarios. First, we design a novel Context-Aware Detection Network (CAD-Net) for accurate and robust object detection within optical remote sensing imagery. Generic object detection techniques usually experience a sharp performance drop when directly applied to remote sensing images, largely due to the object appearance differences in remote sensing images in terms of sparse texture, low contrast, arbitrary orientations, large scale variations, etc. To adapt to this scenario, CAD-Net extracts scene-level and object-level contextual information, which is highly correlated to objects of interest, to provide extra guidance. Besides, a spatial-and-scale-aware attention module is designed to highlight scale-adaptive features and the degraded texture details. Second, we design a novel semantic-aligned matching mechanism to accelerate the convergence of the newly proposed DEtection TRansformer (DETR), which reduces the training iterations by over 95% with improved detection accuracy. Third, we design Meta-DETR for few-shot object detection, which tackles the challenge of training with only a few annotated examples. Meta-DETR fully bypasses the low-quality object proposals for novel classes, thus achieving superior performance to prior R-CNN-based few-shot object detectors. In addition, Meta-DETR performs meta-learning on a set of support classes simultaneously, thus effectively leveraging the inter-class correlation among different classes for better generalization. Fourth, we design a novel paradigm, named Iterative Multi-scale Feature Aggregation (IMFA), to enable the efficient use of multi-scale features in the newly proposed Transformer-based object detectors. Directly incorporating multi-scale features will lead to prohibitive computational costs due to the poor efficiency of the attention mechanism to process high-resolution features. IMFA innovatively exploits sparse multi-scale features only from the most promising and informative locations and significantly improves detection accuracy on multiple object detectors at marginal costs. |
author2 |
Lu Shijian |
author_facet |
Lu Shijian Zhang, Gongjie |
format |
Thesis-Doctor of Philosophy |
author |
Zhang, Gongjie |
author_sort |
Zhang, Gongjie |
title |
Object detection with deep neural networks under constrained scenarios |
title_short |
Object detection with deep neural networks under constrained scenarios |
title_full |
Object detection with deep neural networks under constrained scenarios |
title_fullStr |
Object detection with deep neural networks under constrained scenarios |
title_full_unstemmed |
Object detection with deep neural networks under constrained scenarios |
title_sort |
object detection with deep neural networks under constrained scenarios |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/164687 |
_version_ |
1759854941577412608 |
spelling |
sg-ntu-dr.10356-1646872023-03-06T07:30:04Z Object detection with deep neural networks under constrained scenarios Zhang, Gongjie Lu Shijian School of Computer Science and Engineering Shijian.Lu@ntu.edu.sg Engineering::Computer science and engineering Object detection, which aims to recognize and locate objects within images using bounding boxes, is one of the most fundamental tasks in computer vision. Object detection forms the basis for many other computer vision tasks and has extensive use cases, such as autonomous driving, surveillance, robotic vision, etc. In the past ten years, object detection has made unprecedented progress with the development of deep neural networks. Compared with prior arts that adopt handcrafted features, modern object detectors benefit from the strong feature representations produced by deep neural networks, and have achieved strong performance on many challenging generic object detection benchmarks, such as MSCOCO and OpenImages. However, deep-neural-network-based object detectors are still far from perfect, still facing many challenges under various constrained scenarios. First, modern object detectors heavily rely on visual clues such as texture details, contours, and contrast with the background. However, in some scenarios (e.g., adverse weather or aerial object detection), these features are largely degraded or missing, adding substantial difficulty to object detection. Second, deep-neural-network-based object detectors usually require long training iterations, which are time-consuming and expensive, or even unaffordable to many researchers or companies. Third, as modern object detectors are mostly based on deep neural networks, they require huge amounts of training samples to learn a visual concept. However, such large-scale and annotated datasets are not always available due to expensive human labeling costs or difficulty in data acquisition. Fourth, when deploying modern detectors on edge devices with limited computational capacity, their complexity can be a bottleneck due to run-time requirements. This thesis focuses on advancing object detection in several constrained scenarios. First, we design a novel Context-Aware Detection Network (CAD-Net) for accurate and robust object detection within optical remote sensing imagery. Generic object detection techniques usually experience a sharp performance drop when directly applied to remote sensing images, largely due to the object appearance differences in remote sensing images in terms of sparse texture, low contrast, arbitrary orientations, large scale variations, etc. To adapt to this scenario, CAD-Net extracts scene-level and object-level contextual information, which is highly correlated to objects of interest, to provide extra guidance. Besides, a spatial-and-scale-aware attention module is designed to highlight scale-adaptive features and the degraded texture details. Second, we design a novel semantic-aligned matching mechanism to accelerate the convergence of the newly proposed DEtection TRansformer (DETR), which reduces the training iterations by over 95% with improved detection accuracy. Third, we design Meta-DETR for few-shot object detection, which tackles the challenge of training with only a few annotated examples. Meta-DETR fully bypasses the low-quality object proposals for novel classes, thus achieving superior performance to prior R-CNN-based few-shot object detectors. In addition, Meta-DETR performs meta-learning on a set of support classes simultaneously, thus effectively leveraging the inter-class correlation among different classes for better generalization. Fourth, we design a novel paradigm, named Iterative Multi-scale Feature Aggregation (IMFA), to enable the efficient use of multi-scale features in the newly proposed Transformer-based object detectors. Directly incorporating multi-scale features will lead to prohibitive computational costs due to the poor efficiency of the attention mechanism to process high-resolution features. IMFA innovatively exploits sparse multi-scale features only from the most promising and informative locations and significantly improves detection accuracy on multiple object detectors at marginal costs. Doctor of Philosophy 2023-02-09T03:00:01Z 2023-02-09T03:00:01Z 2022 Thesis-Doctor of Philosophy Zhang, G. (2022). Object detection with deep neural networks under constrained scenarios. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164687 https://hdl.handle.net/10356/164687 10.32657/10356/164687 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |