Deep learning for real-world object detection
Despite achieving significant progresses, most existing detectors are designed to detect objects in academic contexts but consider little in real-world scenarios. In real-world applications, the scale variance of objects can be significantly higher than objects in academic contexts; In addition, exi...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2020
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/etd_coll/300 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1300&context=etd_coll |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.etd_coll-1300 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.etd_coll-13002020-09-13T15:00:06Z Deep learning for real-world object detection WU, Xiongwei Despite achieving significant progresses, most existing detectors are designed to detect objects in academic contexts but consider little in real-world scenarios. In real-world applications, the scale variance of objects can be significantly higher than objects in academic contexts; In addition, existing methods are designed for achieving localization with relatively low precision, however more precise localization is demanded in real-world scenarios; Existing methods are optimized with huge amount of annotated data, but in certain real-world scenarios, only a few samples are available. In this dissertation, we aim to explore novel techniques to address these research challenges to make object detection algorithms practical for real-world applications. The first problem is scale-invariant detection. Detecting objects with multiple scales is covered in existing detection benchmarks. However, in real-world applications the scale variance of objects is extremely high and thus it requires more discriminative features. Face detection is a suitable benchmark to evaluate scale-invariant detection due to the vastly different scales of faces. In this dissertation, we propose a novel framework of ``Feature Agglomeration Networks" (FAN) to build a new single stage face detector. A novel feature agglomeration block is proposed to enhance low-level feature representation and the model is optimized in a hierarchical manner. FAN achieved state-of-the-art results in real world face detection benchmarks with real-time inference speed. The second problem is high-quality detection. This challenge requires detectors to predict more precise localization. In this dissertation, we propose two novel detection frameworks for high-quality detection: ``Bidirectional Pyramid Networks'' (BPN) and ``KPNet''. In BPN, a Bidirectional Feature Pyramid structure is proposed for robust feature representations, and a Cascade Anchor Refinement is proposed to gradually refine the quality of pre-designed anchors. To eliminate the initial anchor design step in BPN, KPNet is proposed which automatically learns to optimize a dynamic set of high-quality keypoints without heuristic anchor design. Both BPN and KPNet show significant improvement over existing on MSCOCO dataset, especially in high quality detection settings. The third problem is few-shot detection, where only a few training samples are available. Inspired by the principle of meta-learning methods, we propose two novel meta-learning based few-shot detectors: ``Meta-RCNN" and ``Meta Constrastive Detector'' (MCD). Meta-RCNN learns an binary object detector in an episodic learning paradigm on the training data with a class-aware attention module, and it can be end-to-end meta-optimized. Based on Meta-RCNN, MCD follows the principle of contrastive learning to enhance the feature representation for few-shot detection, and a new hard negative sampling strategy is proposed to address imbalance of training samples. We demonstrate the effectiveness of Meta-RCNN and MCD in few-shot detection on Pascal VOC dataset and obtain promising results. The proposed techniques address the problems discussed and show significant improvement on real-world utility. 2020-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/300 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1300&context=etd_coll http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University Deep Learning Deep Convolutional Neural Networks Object Detection Databases and Information Systems Data Storage Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Deep Learning Deep Convolutional Neural Networks Object Detection Databases and Information Systems Data Storage Systems |
spellingShingle |
Deep Learning Deep Convolutional Neural Networks Object Detection Databases and Information Systems Data Storage Systems WU, Xiongwei Deep learning for real-world object detection |
description |
Despite achieving significant progresses, most existing detectors are designed to detect objects in academic contexts but consider little in real-world scenarios. In real-world applications, the scale variance of objects can be significantly higher than objects in academic contexts; In addition, existing methods are designed for achieving localization with relatively low precision, however more precise localization is demanded in real-world scenarios; Existing methods are optimized with huge amount of annotated data, but in certain real-world scenarios, only a few samples are available. In this dissertation, we aim to explore novel techniques to address these research challenges to make object detection algorithms practical for real-world applications.
The first problem is scale-invariant detection. Detecting objects with multiple scales is covered in existing detection benchmarks. However, in real-world applications the scale variance of objects is extremely high and thus it requires more discriminative features. Face detection is a suitable benchmark to evaluate scale-invariant detection due to the vastly different scales of faces. In this dissertation, we propose a novel framework of ``Feature Agglomeration Networks" (FAN) to build a new single stage face detector. A novel feature agglomeration block is proposed to enhance low-level feature representation and the model is optimized in a hierarchical manner. FAN achieved state-of-the-art results in real world face detection benchmarks with real-time inference speed.
The second problem is high-quality detection. This challenge requires detectors to predict more precise localization. In this dissertation, we propose two novel detection frameworks for high-quality detection: ``Bidirectional Pyramid Networks'' (BPN) and ``KPNet''. In BPN, a Bidirectional Feature Pyramid structure is proposed for robust feature representations, and a Cascade Anchor Refinement is proposed to gradually refine the quality of pre-designed anchors. To eliminate the initial anchor design step in BPN, KPNet is proposed which automatically learns to optimize a dynamic set of high-quality keypoints without heuristic anchor design. Both BPN and KPNet show significant improvement over existing on MSCOCO dataset, especially in high quality detection settings.
The third problem is few-shot detection, where only a few training samples are available. Inspired by the principle of meta-learning methods, we propose two novel meta-learning based few-shot detectors: ``Meta-RCNN" and ``Meta Constrastive Detector'' (MCD). Meta-RCNN learns an binary object detector in an episodic learning paradigm on the training data with a class-aware attention module, and it can be end-to-end meta-optimized. Based on Meta-RCNN, MCD follows the principle of contrastive learning to enhance the feature representation for few-shot detection, and a new hard negative sampling strategy is proposed to address imbalance of training samples. We demonstrate the effectiveness of Meta-RCNN and MCD in few-shot detection on Pascal VOC dataset and obtain promising results.
The proposed techniques address the problems discussed and show significant improvement on real-world utility. |
format |
text |
author |
WU, Xiongwei |
author_facet |
WU, Xiongwei |
author_sort |
WU, Xiongwei |
title |
Deep learning for real-world object detection |
title_short |
Deep learning for real-world object detection |
title_full |
Deep learning for real-world object detection |
title_fullStr |
Deep learning for real-world object detection |
title_full_unstemmed |
Deep learning for real-world object detection |
title_sort |
deep learning for real-world object detection |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2020 |
url |
https://ink.library.smu.edu.sg/etd_coll/300 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1300&context=etd_coll |
_version_ |
1712300945408262144 |