Pedestrian detection using faster-RCNN

Pedestrian detection has been an active research topic for some time now. Before the advent of neural networks and deep learning, we had a relatively simpler task in hand of extracting the features and classifying the object based on it’s features. The Histogram of oriented Gradients (HoG)[1] mod...

Full description

Saved in:
Bibliographic Details
Main Author: Munshi Harsh Hemangkumar
Other Authors: Justin Dauwels
Format: Theses and Dissertations
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/69526
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Pedestrian detection has been an active research topic for some time now. Before the advent of neural networks and deep learning, we had a relatively simpler task in hand of extracting the features and classifying the object based on it’s features. The Histogram of oriented Gradients (HoG)[1] model was the first strong proposed theory which surpassed the performance of it competitiors. Following to the HoG model there were a lot of variants of HoG like HoGlbp[2], MultiFtr[3] which tried to boost the accuracy of the existing HoG+SVM[1] model(s). On the other end of spectrum the object detection algorithm, region-based convolutional neural network (RCNN), is very popular in recent years. It boosts the performance significantly by making a combination of two key insights. The first one is to localize and segment objects by applying high-capacity convolutional neural network to bottom-up region proposals. We try to train a model using our own variant of a deep architecture, using the open source implementation of faster-RCNN[4] using the existing datasets. In this thesis, we present a custom caffe model which is inspired from ZF Neural Network and set it up for faster RCNN object detection scheme. We train the system in two different ways, one with only pedestrian images and other with multiple classes. We then test the system with custom built test images with annotations and observe the performance and compare it. Finally, in the multiclass approach, with the help of deep visualizations we observe the learnt detector and discuss how can we use it in the future work section of this thesis. The average precision of pedestrian only model was found to be ~81% and that of multiclass detector was found to be ~70%. The result are discussed in the results and experiments section with details on train time, test time and accuracies.