A machine learning method for pedestrian detection

Pedestrian detection has always been a challenging task of computer vision research for many decades. This dissertation presents a system that realizes the pedestrian detection in the surveillance video based on Convolutional Neural Network and video processing. The detection performance is tested o...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Yi
Other Authors: Chau Lap Pui
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/75956
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Pedestrian detection has always been a challenging task of computer vision research for many decades. This dissertation presents a system that realizes the pedestrian detection in the surveillance video based on Convolutional Neural Network and video processing. The detection performance is tested on the VOC2007 testing dataset. Comparing with other scenes, pedestrian detection at a bus stop often involves small target (pedestrian) size and a high degree of occlusion. To address these issues, Single Shot MultiBox Detector (SSD) algorithm is proposed recently. I use SSD algorithm to design the pedestrian detection system in a bus stop environment. In this dissertation, we employ ResNet50 to extract features, rather than VGG16 that was used in the SSD paper. This method trained by the VOC2007+2012 training dataset improves the mean average precision (mAP) from 79.7 to 79.9 on VOC2007 testing dataset. Moreover, by training the network with COCO datasets, we can achieve the best results with mAP of 85.1 compared with Fast R-CCN, Faster R-CNN, and original SSD. Apart from using machine learning algorithms for this dissertation, some of the other works involves video processing. The system uses the surveillance camera at a campus bus stop to collect the videos. Then, the system detects the pedestrian in the region of interest (ROI) of each frame and divides them into two groups according to their positions. One is “wait” which is the pedestrian who is waiting at the bus stop; the other is “cross” which is the pedestrian who is crossing the road. Finally, the system marks the total number of pedestrians in the ROI. After the system has finished detecting a single frame of the surveillance video, it will automatically read and detect the next frame. To achieve higher detection accuracy and faster speed, some techniques such as image cropping of ROI and queue or multi-threading structure are implemented.