People detection and tracking in videos at MRT stations in Singapore

Pedestrian detection could be achieved by running through Aggregated Channel Feature Detector, which may estimate original image channel features from the nearest downsampled channel scale via bilinear interpolation. It allows to find a balance between computational complexity and detection accuracy...

Full description

Saved in:
Bibliographic Details
Main Author: Hu, Leimin
Other Authors: Yuan Junsong
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/64168
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Pedestrian detection could be achieved by running through Aggregated Channel Feature Detector, which may estimate original image channel features from the nearest downsampled channel scale via bilinear interpolation. It allows to find a balance between computational complexity and detection accuracy as interpolation is faster than feature computation, which is an acceptable trade-off. Adaptive boost is used to train the decision trees and combine them over all features in aggregated channels. The classifier discriminates pedestrians from background of MRT station. Pedestrians present in every frame are annotated by a rectangular box with a padding size and the boxes are re-sized according to set aspect-ratio from analysis of collected video data. Moreover, non-maximal suppression is used to remove multiple detections of the same pedestrians regarding confidence score as standard. The key conception of tracking model is an approach of spatio-temporal relation. The model is built based on relationship between target and local context region in space and updated by learning spatial model derived from the previous consecutive frame over time. Confidence map is introduced to reduce location ambiguous problem and avoid overfitting by adjusting shape parameter. And adaptive scale scheme is proposed for varying tracked target scale. This fast tracking method with only four Fast Fourier Transform operations which is processed over one frame, resulting in improving operational complex to O (MN log (MN)) for the local context region of M x N pixels.