Fast and accurate vision-based pedestrian detection

Pedestrian detection is an essential task in applications such as automotive safety, surveillance, and robotics. Achieving accurate vision-based pedestrian detection faces many challenges arising from highly cluttered background, high intra-class variations, inconsistent illumination, heavy occlusio...

Full description

Saved in:

Bibliographic Details
Main Author:	Zhou, Chengju
Other Authors:	Lam Siew Kei
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/148933
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-148933
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Zhou, Chengju Fast and accurate vision-based pedestrian detection
description	Pedestrian detection is an essential task in applications such as automotive safety, surveillance, and robotics. Achieving accurate vision-based pedestrian detection faces many challenges arising from highly cluttered background, high intra-class variations, inconsistent illumination, heavy occlusion, and the need to detect small-scale pedestrians. In addition, practical applications often require fast detection of pedestrians on embedded systems with stringent computational resources. This PhD research aims to develop fast and accurate vision-based pedestrian detection utilizing hand-crafted features and deep learning methods to meet the varied requirements in real-world applications. We first proposed a non deep learning pedestrian detection framework using the top-performing Filtered Channel Features (FCF) approach. Contrary to existing works that utilize many matrix-form filters or few very large-size filters, the proposed method exploits binary vector form filters to effectively and efficiently build robust pedestrian feature representation for detection. A two-stage induced group cost-sensitive RealBoost is introduced to assign varied costs for misclassified samples with different difficulties in training in order to enhance detection of harder samples. Two strategies are proposed to further improve overall detection speed at the image pyramid level and channel feature level. Experimental results on the widely-used Caltech benchmark show that the proposed framework achieves much better detection performance and can run about 148x faster than the best reported FCF method. A fast and robust pedestrian detection framework was developed next, which exploits lightweight vector form decorrelated filters to build more robust feature representation. A group cost-sensitive BoostLR (Boosting with Loss Regularization) is used to provide higher attention to the harder samples during training, which enabled controlled generalization and improved detection performance. Experimental results on INRIA, Caltech and CityPersons pedestrian detection benchmarks demonstrate that the proposed detection framework obtains better detection performance than all state-of-the-art non deep learning approaches and runs an order of magnitude faster than existing top-performing FCF methods. We then explored deep learning methods for accurate and fast pedestrian detection. A unified multi-task neural network learning architecture is proposed to efficiently and effectively inter-fuse the task of semantic segmentation and pedestrian detection. In the proposed learning architecture, we employed Faster R-CNN as the base detector and attached a lightweight semantic segmentation branch that enabled end-to-end hard parameter sharing to improve pedestrian detection, while maintaining computational efficiency. A simple anchor matching strategy is designed to alleviate the problem of feature misalignment for detecting heavily occluded pedestrians. Our proposed multi-task learning architecture is able to achieve improved pedestrian detection in diverse scenarios while maintaining lower computational complexity. Furthermore, the proposed method can obtain improved performance with downsampled images as input, which notably reduces the overall computational complexity. Experimental results on well-known CityPersons and Caltech pedestrian detection benchmarks demonstrate that our proposed learning architecture runs much faster than state-of-the-art pedestrian detection approaches while obtaining competitive detection accuracy. The Faster R-CNN methods obtain top performance in pedestrian detection task but lead to extremely high computational complexity as the complexity of R-CNN linearly increases with number of input proposals. To overcome this problem, we proposed a R-FCN based pedestrian detection framework that incorporates semantic segmentation to confidence modules for RPN head and R-FCN head, and a cascaded R-FCN head. The semantic segmentation confidence modules employ semantic segmentation branch with coarse box-wise annotations designed for the task of pedestrian detection as supervision signals to obtain semantic segmentation result. Then semantic segmentation confidence is computed and utilized as auxiliary classification prior knowledge for RPN proposal selection and R-FCN head prediction. The proposed cascaded R-FCN head progressively refines the prediction accuracy with negligible computation overhead. Experimental results on well-known CityPersons and MOT17Det pedestrian detection benchmarks demonstrate that the proposed detection framework achieves competitive detection accuracy with about 3x speedup over state-of-the-art pedestrian detection methods.
author2	Lam Siew Kei
author_facet	Lam Siew Kei Zhou, Chengju
format	Thesis-Doctor of Philosophy
author	Zhou, Chengju
author_sort	Zhou, Chengju
title	Fast and accurate vision-based pedestrian detection
title_short	Fast and accurate vision-based pedestrian detection
title_full	Fast and accurate vision-based pedestrian detection
title_fullStr	Fast and accurate vision-based pedestrian detection
title_full_unstemmed	Fast and accurate vision-based pedestrian detection
title_sort	fast and accurate vision-based pedestrian detection
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/148933
_version_	1705151316375896064
spelling	sg-ntu-dr.10356-1489332021-07-08T16:00:36Z Fast and accurate vision-based pedestrian detection Zhou, Chengju Lam Siew Kei School of Computer Science and Engineering Hardware & Embedded Systems Lab (HESL) ASSKLam@ntu.edu.sg Engineering::Computer science and engineering Pedestrian detection is an essential task in applications such as automotive safety, surveillance, and robotics. Achieving accurate vision-based pedestrian detection faces many challenges arising from highly cluttered background, high intra-class variations, inconsistent illumination, heavy occlusion, and the need to detect small-scale pedestrians. In addition, practical applications often require fast detection of pedestrians on embedded systems with stringent computational resources. This PhD research aims to develop fast and accurate vision-based pedestrian detection utilizing hand-crafted features and deep learning methods to meet the varied requirements in real-world applications. We first proposed a non deep learning pedestrian detection framework using the top-performing Filtered Channel Features (FCF) approach. Contrary to existing works that utilize many matrix-form filters or few very large-size filters, the proposed method exploits binary vector form filters to effectively and efficiently build robust pedestrian feature representation for detection. A two-stage induced group cost-sensitive RealBoost is introduced to assign varied costs for misclassified samples with different difficulties in training in order to enhance detection of harder samples. Two strategies are proposed to further improve overall detection speed at the image pyramid level and channel feature level. Experimental results on the widely-used Caltech benchmark show that the proposed framework achieves much better detection performance and can run about 148x faster than the best reported FCF method. A fast and robust pedestrian detection framework was developed next, which exploits lightweight vector form decorrelated filters to build more robust feature representation. A group cost-sensitive BoostLR (Boosting with Loss Regularization) is used to provide higher attention to the harder samples during training, which enabled controlled generalization and improved detection performance. Experimental results on INRIA, Caltech and CityPersons pedestrian detection benchmarks demonstrate that the proposed detection framework obtains better detection performance than all state-of-the-art non deep learning approaches and runs an order of magnitude faster than existing top-performing FCF methods. We then explored deep learning methods for accurate and fast pedestrian detection. A unified multi-task neural network learning architecture is proposed to efficiently and effectively inter-fuse the task of semantic segmentation and pedestrian detection. In the proposed learning architecture, we employed Faster R-CNN as the base detector and attached a lightweight semantic segmentation branch that enabled end-to-end hard parameter sharing to improve pedestrian detection, while maintaining computational efficiency. A simple anchor matching strategy is designed to alleviate the problem of feature misalignment for detecting heavily occluded pedestrians. Our proposed multi-task learning architecture is able to achieve improved pedestrian detection in diverse scenarios while maintaining lower computational complexity. Furthermore, the proposed method can obtain improved performance with downsampled images as input, which notably reduces the overall computational complexity. Experimental results on well-known CityPersons and Caltech pedestrian detection benchmarks demonstrate that our proposed learning architecture runs much faster than state-of-the-art pedestrian detection approaches while obtaining competitive detection accuracy. The Faster R-CNN methods obtain top performance in pedestrian detection task but lead to extremely high computational complexity as the complexity of R-CNN linearly increases with number of input proposals. To overcome this problem, we proposed a R-FCN based pedestrian detection framework that incorporates semantic segmentation to confidence modules for RPN head and R-FCN head, and a cascaded R-FCN head. The semantic segmentation confidence modules employ semantic segmentation branch with coarse box-wise annotations designed for the task of pedestrian detection as supervision signals to obtain semantic segmentation result. Then semantic segmentation confidence is computed and utilized as auxiliary classification prior knowledge for RPN proposal selection and R-FCN head prediction. The proposed cascaded R-FCN head progressively refines the prediction accuracy with negligible computation overhead. Experimental results on well-known CityPersons and MOT17Det pedestrian detection benchmarks demonstrate that the proposed detection framework achieves competitive detection accuracy with about 3x speedup over state-of-the-art pedestrian detection methods. Doctor of Philosophy 2021-05-14T08:05:22Z 2021-05-14T08:05:22Z 2021 Thesis-Doctor of Philosophy Zhou, C. (2021). Fast and accurate vision-based pedestrian detection. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/148933 https://hdl.handle.net/10356/148933 10.32657/10356/148933 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Fast and accurate vision-based pedestrian detection

Similar Items