A unified multi-task learning architecture for fast and accurate pedestrian detection
We present a unified multi-task learning architecture for fast and accurate pedestrian detection. Different from existing methods which often focus on either a new loss function or architecture, we propose an improved multi-task convolutional neural network learning architecture to effectively and...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/147488 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | We present a unified multi-task learning architecture for fast and accurate pedestrian detection. Different from
existing methods which often focus on either a new loss function
or architecture, we propose an improved multi-task convolutional neural network learning architecture to effectively and
efficiently interfuse the task of pedestrian detection and semantic
segmentation. To achieve this, we integrate a lightweight semantic
segmentation branch to Faster R-CNN detection framework that
enables end-to-end hard parameter sharing in order to boost
the detection performance and maintain computational efficiency
as follows. Firstly, a Semantic Segmentation to Feature Module
(SS2FM) refines the convolutional features in RPN stage by
integrating the features generated from the semantic segmentation branch. Secondly, a Semantic Segmentation to Confidence
Module (SS2CM) refines the classification confidence in RPN
stage by fusing it with the semantic segmentation confidence.
We also introduce an effective anchor matching point transform
to alleviate the problem of feature misalignment for heavily
occluded pedestrians. The proposed unified multi-task learning
architecture lends itself well to more robust pedestrian detection
in diverse scenarios with negligible computation overhead. In
addition, the proposed architecture can achieve high detection
performance with low resolution input images, which significantly
reduces the computational complexity. Experiment results on
CityPersons and Caltech datasets show that our method is the
fastest among all state-of-the-art pedestrian detection methods
while exhibiting competitive detection performance. |
---|