Deep learning for real-world object detection

Despite achieving significant progresses, most existing detectors are designed to detect objects in academic contexts but consider little in real-world scenarios. In real-world applications, the scale variance of objects can be significantly higher than objects in academic contexts; In addition, exi...

Full description

Saved in:

Bibliographic Details
Main Author:	WU, Xiongwei
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2020
Subjects:	Deep Learning Deep Convolutional Neural Networks Object Detection Databases and Information Systems Data Storage Systems
Online Access:	https://ink.library.smu.edu.sg/etd_coll/300 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1300&context=etd_coll
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.etd_coll-1300
record_format	dspace
spelling	sg-smu-ink.etd_coll-13002020-09-13T15:00:06Z Deep learning for real-world object detection WU, Xiongwei Despite achieving significant progresses, most existing detectors are designed to detect objects in academic contexts but consider little in real-world scenarios. In real-world applications, the scale variance of objects can be significantly higher than objects in academic contexts; In addition, existing methods are designed for achieving localization with relatively low precision, however more precise localization is demanded in real-world scenarios; Existing methods are optimized with huge amount of annotated data, but in certain real-world scenarios, only a few samples are available. In this dissertation, we aim to explore novel techniques to address these research challenges to make object detection algorithms practical for real-world applications. The first problem is scale-invariant detection. Detecting objects with multiple scales is covered in existing detection benchmarks. However, in real-world applications the scale variance of objects is extremely high and thus it requires more discriminative features. Face detection is a suitable benchmark to evaluate scale-invariant detection due to the vastly different scales of faces. In this dissertation, we propose a novel framework of ``Feature Agglomeration Networks" (FAN) to build a new single stage face detector. A novel feature agglomeration block is proposed to enhance low-level feature representation and the model is optimized in a hierarchical manner. FAN achieved state-of-the-art results in real world face detection benchmarks with real-time inference speed. The second problem is high-quality detection. This challenge requires detectors to predict more precise localization. In this dissertation, we propose two novel detection frameworks for high-quality detection: ``Bidirectional Pyramid Networks'' (BPN) and ``KPNet''. In BPN, a Bidirectional Feature Pyramid structure is proposed for robust feature representations, and a Cascade Anchor Refinement is proposed to gradually refine the quality of pre-designed anchors. To eliminate the initial anchor design step in BPN, KPNet is proposed which automatically learns to optimize a dynamic set of high-quality keypoints without heuristic anchor design. Both BPN and KPNet show significant improvement over existing on MSCOCO dataset, especially in high quality detection settings. The third problem is few-shot detection, where only a few training samples are available. Inspired by the principle of meta-learning methods, we propose two novel meta-learning based few-shot detectors: ``Meta-RCNN" and ``Meta Constrastive Detector'' (MCD). Meta-RCNN learns an binary object detector in an episodic learning paradigm on the training data with a class-aware attention module, and it can be end-to-end meta-optimized. Based on Meta-RCNN, MCD follows the principle of contrastive learning to enhance the feature representation for few-shot detection, and a new hard negative sampling strategy is proposed to address imbalance of training samples. We demonstrate the effectiveness of Meta-RCNN and MCD in few-shot detection on Pascal VOC dataset and obtain promising results. The proposed techniques address the problems discussed and show significant improvement on real-world utility. 2020-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/300 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1300&context=etd_coll http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University Deep Learning Deep Convolutional Neural Networks Object Detection Databases and Information Systems Data Storage Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Deep Learning Deep Convolutional Neural Networks Object Detection Databases and Information Systems Data Storage Systems
spellingShingle	Deep Learning Deep Convolutional Neural Networks Object Detection Databases and Information Systems Data Storage Systems WU, Xiongwei Deep learning for real-world object detection
description	Despite achieving significant progresses, most existing detectors are designed to detect objects in academic contexts but consider little in real-world scenarios. In real-world applications, the scale variance of objects can be significantly higher than objects in academic contexts; In addition, existing methods are designed for achieving localization with relatively low precision, however more precise localization is demanded in real-world scenarios; Existing methods are optimized with huge amount of annotated data, but in certain real-world scenarios, only a few samples are available. In this dissertation, we aim to explore novel techniques to address these research challenges to make object detection algorithms practical for real-world applications. The first problem is scale-invariant detection. Detecting objects with multiple scales is covered in existing detection benchmarks. However, in real-world applications the scale variance of objects is extremely high and thus it requires more discriminative features. Face detection is a suitable benchmark to evaluate scale-invariant detection due to the vastly different scales of faces. In this dissertation, we propose a novel framework of ``Feature Agglomeration Networks" (FAN) to build a new single stage face detector. A novel feature agglomeration block is proposed to enhance low-level feature representation and the model is optimized in a hierarchical manner. FAN achieved state-of-the-art results in real world face detection benchmarks with real-time inference speed. The second problem is high-quality detection. This challenge requires detectors to predict more precise localization. In this dissertation, we propose two novel detection frameworks for high-quality detection: ``Bidirectional Pyramid Networks'' (BPN) and ``KPNet''. In BPN, a Bidirectional Feature Pyramid structure is proposed for robust feature representations, and a Cascade Anchor Refinement is proposed to gradually refine the quality of pre-designed anchors. To eliminate the initial anchor design step in BPN, KPNet is proposed which automatically learns to optimize a dynamic set of high-quality keypoints without heuristic anchor design. Both BPN and KPNet show significant improvement over existing on MSCOCO dataset, especially in high quality detection settings. The third problem is few-shot detection, where only a few training samples are available. Inspired by the principle of meta-learning methods, we propose two novel meta-learning based few-shot detectors: ``Meta-RCNN" and ``Meta Constrastive Detector'' (MCD). Meta-RCNN learns an binary object detector in an episodic learning paradigm on the training data with a class-aware attention module, and it can be end-to-end meta-optimized. Based on Meta-RCNN, MCD follows the principle of contrastive learning to enhance the feature representation for few-shot detection, and a new hard negative sampling strategy is proposed to address imbalance of training samples. We demonstrate the effectiveness of Meta-RCNN and MCD in few-shot detection on Pascal VOC dataset and obtain promising results. The proposed techniques address the problems discussed and show significant improvement on real-world utility.
format	text
author	WU, Xiongwei
author_facet	WU, Xiongwei
author_sort	WU, Xiongwei
title	Deep learning for real-world object detection
title_short	Deep learning for real-world object detection
title_full	Deep learning for real-world object detection
title_fullStr	Deep learning for real-world object detection
title_full_unstemmed	Deep learning for real-world object detection
title_sort	deep learning for real-world object detection
publisher	Institutional Knowledge at Singapore Management University
publishDate	2020
url	https://ink.library.smu.edu.sg/etd_coll/300 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1300&context=etd_coll
_version_	1712300945408262144

Deep learning for real-world object detection

Similar Items