Fast and accurate vision-based pedestrian detection
Pedestrian detection is an essential task in applications such as automotive safety, surveillance, and robotics. Achieving accurate vision-based pedestrian detection faces many challenges arising from highly cluttered background, high intra-class variations, inconsistent illumination, heavy occlusio...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/148933 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-148933 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Zhou, Chengju Fast and accurate vision-based pedestrian detection |
description |
Pedestrian detection is an essential task in applications such as automotive safety, surveillance, and robotics. Achieving accurate vision-based pedestrian detection faces many challenges arising from highly cluttered background, high intra-class variations, inconsistent illumination, heavy occlusion, and the need to detect small-scale pedestrians. In addition, practical applications often require fast detection of pedestrians on embedded systems with stringent computational resources. This PhD research aims to develop fast and accurate vision-based pedestrian detection utilizing hand-crafted features and deep learning methods to meet the varied requirements in real-world applications.
We first proposed a non deep learning pedestrian detection framework using the top-performing Filtered Channel Features (FCF) approach. Contrary to existing works that utilize many matrix-form filters or few very large-size filters, the proposed method exploits binary vector form filters to effectively and efficiently build robust pedestrian feature representation for detection. A two-stage induced group cost-sensitive RealBoost is introduced to assign varied costs for misclassified samples with different difficulties in training in order to enhance detection of harder samples. Two strategies are proposed to further improve overall detection speed at the image pyramid level and channel feature level. Experimental results on the widely-used Caltech benchmark show that the proposed framework achieves much better detection performance and can run about 148x faster than the best reported FCF method.
A fast and robust pedestrian detection framework was developed next, which exploits lightweight vector form decorrelated filters to build more robust feature representation. A group cost-sensitive BoostLR (Boosting with Loss Regularization) is used to provide higher attention to the harder samples during training, which enabled controlled generalization and improved detection performance.
Experimental results on INRIA, Caltech and CityPersons pedestrian detection benchmarks demonstrate that the proposed detection framework obtains better detection performance than all state-of-the-art non deep learning approaches and runs an order of magnitude faster than existing top-performing FCF methods.
We then explored deep learning methods for accurate and fast pedestrian detection. A unified multi-task neural network learning architecture is proposed to efficiently and effectively inter-fuse the task of semantic segmentation and pedestrian detection. In the proposed learning architecture, we employed Faster R-CNN as the base detector and attached a lightweight semantic segmentation branch that enabled end-to-end hard parameter sharing to improve pedestrian detection, while maintaining computational efficiency. A simple anchor matching strategy is designed to alleviate the problem of feature misalignment for detecting heavily occluded pedestrians. Our proposed multi-task learning architecture is able to achieve improved pedestrian detection in diverse scenarios while maintaining lower computational complexity. Furthermore, the proposed method can obtain improved performance with downsampled images as input, which notably reduces the overall computational complexity. Experimental results on well-known CityPersons and Caltech pedestrian detection benchmarks demonstrate that our proposed learning architecture runs much faster than state-of-the-art pedestrian detection approaches while obtaining competitive detection accuracy.
The Faster R-CNN methods obtain top performance in pedestrian detection task but lead to extremely high computational complexity as the complexity of R-CNN linearly increases with number of input proposals. To overcome this problem, we proposed a R-FCN based pedestrian detection framework that incorporates semantic segmentation to confidence modules for RPN head and R-FCN head, and a cascaded R-FCN head. The semantic segmentation confidence modules employ semantic segmentation branch with coarse box-wise annotations designed for the task of pedestrian detection as supervision signals to obtain semantic segmentation result. Then semantic segmentation confidence is computed and utilized as auxiliary classification prior knowledge for RPN proposal selection and R-FCN head prediction. The proposed cascaded R-FCN head progressively refines the prediction accuracy with negligible computation overhead. Experimental results on well-known CityPersons and MOT17Det pedestrian detection benchmarks demonstrate that the proposed detection framework achieves competitive detection accuracy with about 3x speedup over state-of-the-art pedestrian detection methods. |
author2 |
Lam Siew Kei |
author_facet |
Lam Siew Kei Zhou, Chengju |
format |
Thesis-Doctor of Philosophy |
author |
Zhou, Chengju |
author_sort |
Zhou, Chengju |
title |
Fast and accurate vision-based pedestrian detection |
title_short |
Fast and accurate vision-based pedestrian detection |
title_full |
Fast and accurate vision-based pedestrian detection |
title_fullStr |
Fast and accurate vision-based pedestrian detection |
title_full_unstemmed |
Fast and accurate vision-based pedestrian detection |
title_sort |
fast and accurate vision-based pedestrian detection |
publisher |
Nanyang Technological University |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/148933 |
_version_ |
1705151316375896064 |
spelling |
sg-ntu-dr.10356-1489332021-07-08T16:00:36Z Fast and accurate vision-based pedestrian detection Zhou, Chengju Lam Siew Kei School of Computer Science and Engineering Hardware & Embedded Systems Lab (HESL) ASSKLam@ntu.edu.sg Engineering::Computer science and engineering Pedestrian detection is an essential task in applications such as automotive safety, surveillance, and robotics. Achieving accurate vision-based pedestrian detection faces many challenges arising from highly cluttered background, high intra-class variations, inconsistent illumination, heavy occlusion, and the need to detect small-scale pedestrians. In addition, practical applications often require fast detection of pedestrians on embedded systems with stringent computational resources. This PhD research aims to develop fast and accurate vision-based pedestrian detection utilizing hand-crafted features and deep learning methods to meet the varied requirements in real-world applications. We first proposed a non deep learning pedestrian detection framework using the top-performing Filtered Channel Features (FCF) approach. Contrary to existing works that utilize many matrix-form filters or few very large-size filters, the proposed method exploits binary vector form filters to effectively and efficiently build robust pedestrian feature representation for detection. A two-stage induced group cost-sensitive RealBoost is introduced to assign varied costs for misclassified samples with different difficulties in training in order to enhance detection of harder samples. Two strategies are proposed to further improve overall detection speed at the image pyramid level and channel feature level. Experimental results on the widely-used Caltech benchmark show that the proposed framework achieves much better detection performance and can run about 148x faster than the best reported FCF method. A fast and robust pedestrian detection framework was developed next, which exploits lightweight vector form decorrelated filters to build more robust feature representation. A group cost-sensitive BoostLR (Boosting with Loss Regularization) is used to provide higher attention to the harder samples during training, which enabled controlled generalization and improved detection performance. Experimental results on INRIA, Caltech and CityPersons pedestrian detection benchmarks demonstrate that the proposed detection framework obtains better detection performance than all state-of-the-art non deep learning approaches and runs an order of magnitude faster than existing top-performing FCF methods. We then explored deep learning methods for accurate and fast pedestrian detection. A unified multi-task neural network learning architecture is proposed to efficiently and effectively inter-fuse the task of semantic segmentation and pedestrian detection. In the proposed learning architecture, we employed Faster R-CNN as the base detector and attached a lightweight semantic segmentation branch that enabled end-to-end hard parameter sharing to improve pedestrian detection, while maintaining computational efficiency. A simple anchor matching strategy is designed to alleviate the problem of feature misalignment for detecting heavily occluded pedestrians. Our proposed multi-task learning architecture is able to achieve improved pedestrian detection in diverse scenarios while maintaining lower computational complexity. Furthermore, the proposed method can obtain improved performance with downsampled images as input, which notably reduces the overall computational complexity. Experimental results on well-known CityPersons and Caltech pedestrian detection benchmarks demonstrate that our proposed learning architecture runs much faster than state-of-the-art pedestrian detection approaches while obtaining competitive detection accuracy. The Faster R-CNN methods obtain top performance in pedestrian detection task but lead to extremely high computational complexity as the complexity of R-CNN linearly increases with number of input proposals. To overcome this problem, we proposed a R-FCN based pedestrian detection framework that incorporates semantic segmentation to confidence modules for RPN head and R-FCN head, and a cascaded R-FCN head. The semantic segmentation confidence modules employ semantic segmentation branch with coarse box-wise annotations designed for the task of pedestrian detection as supervision signals to obtain semantic segmentation result. Then semantic segmentation confidence is computed and utilized as auxiliary classification prior knowledge for RPN proposal selection and R-FCN head prediction. The proposed cascaded R-FCN head progressively refines the prediction accuracy with negligible computation overhead. Experimental results on well-known CityPersons and MOT17Det pedestrian detection benchmarks demonstrate that the proposed detection framework achieves competitive detection accuracy with about 3x speedup over state-of-the-art pedestrian detection methods. Doctor of Philosophy 2021-05-14T08:05:22Z 2021-05-14T08:05:22Z 2021 Thesis-Doctor of Philosophy Zhou, C. (2021). Fast and accurate vision-based pedestrian detection. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/148933 https://hdl.handle.net/10356/148933 10.32657/10356/148933 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |