Enhanced multi-task learning architecture for detecting pedestrian at far distance

Existing pedestrian detection methods suffer from performance degradation in the presence of small-scale pedestrians who are positioned at far distance from the camera. We present a pedestrian detection framework that is not only robust to small- and large-scale pedestrians, but is also significantl...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhou, Chengju, Wu, Meiqing, Lam, Siew-Kei
Other Authors: College of Computing and Data Science
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/178581
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Existing pedestrian detection methods suffer from performance degradation in the presence of small-scale pedestrians who are positioned at far distance from the camera. We present a pedestrian detection framework that is not only robust to small- and large-scale pedestrians, but is also significantly faster than state-of-the-art methods. The proposed framework incorporates semantic segmentation to confidence modules for RPN (Region Proposal Network) head and R-FCN (Region-based Fully Convolutional Networks) head, and a cascaded R-FCN head. The semantic segmentation confidence is extracted and utilized as auxiliary classification prior knowledge for RPN proposal selection and R-FCN head prediction. Finally, the cascaded R-FCN head progressively refine the pedestrian prediction accuracy with negligible computation overhead. The proposed framework is also capable of maintaining high detection performance on down-sampled input images, which leads to further reduction in overall computational complexity. Experiment results on CityPersons and MOT17Det datasets show that the proposed framework achieves competitive detection performance with about 3× speedup over state-of-the-art methods.