Object detection with deep neural networks under constrained scenarios

Object detection, which aims to recognize and locate objects within images using bounding boxes, is one of the most fundamental tasks in computer vision. Object detection forms the basis for many other computer vision tasks and has extensive use cases, such as autonomous driving, surveillance, robot...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Gongjie
Other Authors: Lu Shijian
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164687
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-164687
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Zhang, Gongjie
Object detection with deep neural networks under constrained scenarios
description Object detection, which aims to recognize and locate objects within images using bounding boxes, is one of the most fundamental tasks in computer vision. Object detection forms the basis for many other computer vision tasks and has extensive use cases, such as autonomous driving, surveillance, robotic vision, etc. In the past ten years, object detection has made unprecedented progress with the development of deep neural networks. Compared with prior arts that adopt handcrafted features, modern object detectors benefit from the strong feature representations produced by deep neural networks, and have achieved strong performance on many challenging generic object detection benchmarks, such as MSCOCO and OpenImages. However, deep-neural-network-based object detectors are still far from perfect, still facing many challenges under various constrained scenarios. First, modern object detectors heavily rely on visual clues such as texture details, contours, and contrast with the background. However, in some scenarios (e.g., adverse weather or aerial object detection), these features are largely degraded or missing, adding substantial difficulty to object detection. Second, deep-neural-network-based object detectors usually require long training iterations, which are time-consuming and expensive, or even unaffordable to many researchers or companies. Third, as modern object detectors are mostly based on deep neural networks, they require huge amounts of training samples to learn a visual concept. However, such large-scale and annotated datasets are not always available due to expensive human labeling costs or difficulty in data acquisition. Fourth, when deploying modern detectors on edge devices with limited computational capacity, their complexity can be a bottleneck due to run-time requirements. This thesis focuses on advancing object detection in several constrained scenarios. First, we design a novel Context-Aware Detection Network (CAD-Net) for accurate and robust object detection within optical remote sensing imagery. Generic object detection techniques usually experience a sharp performance drop when directly applied to remote sensing images, largely due to the object appearance differences in remote sensing images in terms of sparse texture, low contrast, arbitrary orientations, large scale variations, etc. To adapt to this scenario, CAD-Net extracts scene-level and object-level contextual information, which is highly correlated to objects of interest, to provide extra guidance. Besides, a spatial-and-scale-aware attention module is designed to highlight scale-adaptive features and the degraded texture details. Second, we design a novel semantic-aligned matching mechanism to accelerate the convergence of the newly proposed DEtection TRansformer (DETR), which reduces the training iterations by over 95% with improved detection accuracy. Third, we design Meta-DETR for few-shot object detection, which tackles the challenge of training with only a few annotated examples. Meta-DETR fully bypasses the low-quality object proposals for novel classes, thus achieving superior performance to prior R-CNN-based few-shot object detectors. In addition, Meta-DETR performs meta-learning on a set of support classes simultaneously, thus effectively leveraging the inter-class correlation among different classes for better generalization. Fourth, we design a novel paradigm, named Iterative Multi-scale Feature Aggregation (IMFA), to enable the efficient use of multi-scale features in the newly proposed Transformer-based object detectors. Directly incorporating multi-scale features will lead to prohibitive computational costs due to the poor efficiency of the attention mechanism to process high-resolution features. IMFA innovatively exploits sparse multi-scale features only from the most promising and informative locations and significantly improves detection accuracy on multiple object detectors at marginal costs.
author2 Lu Shijian
author_facet Lu Shijian
Zhang, Gongjie
format Thesis-Doctor of Philosophy
author Zhang, Gongjie
author_sort Zhang, Gongjie
title Object detection with deep neural networks under constrained scenarios
title_short Object detection with deep neural networks under constrained scenarios
title_full Object detection with deep neural networks under constrained scenarios
title_fullStr Object detection with deep neural networks under constrained scenarios
title_full_unstemmed Object detection with deep neural networks under constrained scenarios
title_sort object detection with deep neural networks under constrained scenarios
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/164687
_version_ 1759854941577412608
spelling sg-ntu-dr.10356-1646872023-03-06T07:30:04Z Object detection with deep neural networks under constrained scenarios Zhang, Gongjie Lu Shijian School of Computer Science and Engineering Shijian.Lu@ntu.edu.sg Engineering::Computer science and engineering Object detection, which aims to recognize and locate objects within images using bounding boxes, is one of the most fundamental tasks in computer vision. Object detection forms the basis for many other computer vision tasks and has extensive use cases, such as autonomous driving, surveillance, robotic vision, etc. In the past ten years, object detection has made unprecedented progress with the development of deep neural networks. Compared with prior arts that adopt handcrafted features, modern object detectors benefit from the strong feature representations produced by deep neural networks, and have achieved strong performance on many challenging generic object detection benchmarks, such as MSCOCO and OpenImages. However, deep-neural-network-based object detectors are still far from perfect, still facing many challenges under various constrained scenarios. First, modern object detectors heavily rely on visual clues such as texture details, contours, and contrast with the background. However, in some scenarios (e.g., adverse weather or aerial object detection), these features are largely degraded or missing, adding substantial difficulty to object detection. Second, deep-neural-network-based object detectors usually require long training iterations, which are time-consuming and expensive, or even unaffordable to many researchers or companies. Third, as modern object detectors are mostly based on deep neural networks, they require huge amounts of training samples to learn a visual concept. However, such large-scale and annotated datasets are not always available due to expensive human labeling costs or difficulty in data acquisition. Fourth, when deploying modern detectors on edge devices with limited computational capacity, their complexity can be a bottleneck due to run-time requirements. This thesis focuses on advancing object detection in several constrained scenarios. First, we design a novel Context-Aware Detection Network (CAD-Net) for accurate and robust object detection within optical remote sensing imagery. Generic object detection techniques usually experience a sharp performance drop when directly applied to remote sensing images, largely due to the object appearance differences in remote sensing images in terms of sparse texture, low contrast, arbitrary orientations, large scale variations, etc. To adapt to this scenario, CAD-Net extracts scene-level and object-level contextual information, which is highly correlated to objects of interest, to provide extra guidance. Besides, a spatial-and-scale-aware attention module is designed to highlight scale-adaptive features and the degraded texture details. Second, we design a novel semantic-aligned matching mechanism to accelerate the convergence of the newly proposed DEtection TRansformer (DETR), which reduces the training iterations by over 95% with improved detection accuracy. Third, we design Meta-DETR for few-shot object detection, which tackles the challenge of training with only a few annotated examples. Meta-DETR fully bypasses the low-quality object proposals for novel classes, thus achieving superior performance to prior R-CNN-based few-shot object detectors. In addition, Meta-DETR performs meta-learning on a set of support classes simultaneously, thus effectively leveraging the inter-class correlation among different classes for better generalization. Fourth, we design a novel paradigm, named Iterative Multi-scale Feature Aggregation (IMFA), to enable the efficient use of multi-scale features in the newly proposed Transformer-based object detectors. Directly incorporating multi-scale features will lead to prohibitive computational costs due to the poor efficiency of the attention mechanism to process high-resolution features. IMFA innovatively exploits sparse multi-scale features only from the most promising and informative locations and significantly improves detection accuracy on multiple object detectors at marginal costs. Doctor of Philosophy 2023-02-09T03:00:01Z 2023-02-09T03:00:01Z 2022 Thesis-Doctor of Philosophy Zhang, G. (2022). Object detection with deep neural networks under constrained scenarios. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164687 https://hdl.handle.net/10356/164687 10.32657/10356/164687 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University