Pedestrian detection using faster-RCNN
Pedestrian detection has been an active research topic for some time now. Before the advent of neural networks and deep learning, we had a relatively simpler task in hand of extracting the features and classifying the object based on it’s features. The Histogram of oriented Gradients (HoG)[1] mod...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/69526 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Pedestrian detection has been an active research topic for some time now. Before the
advent of neural networks and deep learning, we had a relatively simpler task in hand
of extracting the features and classifying the object based on it’s features. The
Histogram of oriented Gradients (HoG)[1] model was the first strong proposed theory
which surpassed the performance of it competitiors. Following to the HoG model there
were a lot of variants of HoG like HoGlbp[2], MultiFtr[3] which tried to boost the
accuracy of the existing HoG+SVM[1] model(s). On the other end of spectrum the
object detection algorithm, region-based convolutional neural network (RCNN), is
very popular in recent years. It boosts the performance significantly by making a
combination of two key insights. The first one is to localize and segment objects by
applying high-capacity convolutional neural network to bottom-up region proposals.
We try to train a model using our own variant of a deep architecture, using the open
source implementation of faster-RCNN[4] using the existing datasets.
In this thesis, we present a custom caffe model which is inspired from ZF Neural
Network and set it up for faster RCNN object detection scheme. We train the system
in two different ways, one with only pedestrian images and other with multiple classes.
We then test the system with custom built test images with annotations and observe
the performance and compare it. Finally, in the multiclass approach, with the help of
deep visualizations we observe the learnt detector and discuss how can we use it in the
future work section of this thesis.
The average precision of pedestrian only model was found to be ~81% and that of
multiclass detector was found to be ~70%. The result are discussed in the results and
experiments section with details on train time, test time and accuracies. |
---|