Human detection and tracking in surveillance videos

The thesis addresses the following challenging problems of detecting and tracking humans in the presence of occlusions in typical surveillance videos: (1) adaptation of semantic-part-based human detectors to new surveillance video sequence when trained detectors using other video data not performing...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Bing
Other Authors: Chan Kap Luk
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:https://hdl.handle.net/10356/65919
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-65919
record_format dspace
spelling sg-ntu-dr.10356-659192023-07-04T16:28:02Z Human detection and tracking in surveillance videos Wang, Bing Chan Kap Luk Wang Gang School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems The thesis addresses the following challenging problems of detecting and tracking humans in the presence of occlusions in typical surveillance videos: (1) adaptation of semantic-part-based human detectors to new surveillance video sequence when trained detectors using other video data not performing well on the new video data; (2) tracking of humans with person identification minimizing identification errors over longer tracking periods; and (3) hierarchical spatial and temporal analysis for discriminative tracking of human targets. The thesis aims to improve the state-of-the-art performance in human detection and tracking by studying the human detectors, extended tracking of track segments (tracklets) generated from short term tracking of detection responses. For the adaptation of semantic-part-based human detectors to new surveillance video sequence, a uni ed deep CNN model for joint learning of features, semantic pedestrian part detectors and a transfer learning model is developed. The components within this deep CNN model interact with each other in the learning process, which facilitates the optimization of the learned components during the co-operative learning. In particular, an adaptation layer is proposed to embed the capability of knowledge transfer into the CNN model. As a result, the proposed transferred CNN (T-CNN) model is able to transfer the visual knowledge of the semantic pedestrian parts from the source data to target data. Extensive experimental evaluations show that the proposed method is better than other deep learning based methods in terms of detection performance. Moreover, the adaptive deep features can be complementary to the pre-defined features used by other state-of-the-art methods. For tracking of humans with person identification minimizing identification errors over longer tracking periods, a novel method, based on online target-specific metric learning and coherent dynamics estimation, for tracklet association by network flow optimization is developed. The proposed framework aims to exploit appearance and motion cues to prevent identity switches during tracking and also to recover missed detections. The target-specific metrics (appearance cue) and motion dynamics (motion cue) are proposed to be learned and estimated online, i.e. during the tracking process. Furthermore, a learning algorithm to learn the weights of motion and appearance tracking cues for tracklet affinity models is proposed to handle some difficult situations. Extensive evaluations following state-of-the-art practices have been conducted and the results from these evaluations show the improvements by the proposed method over some existing state-of-the-art methods. In hierarchical spatial and temporal analysis for discriminative tracking of human targets, inspired by recent advances in convolutional neural network (CNN) architectures, a novel uni ed deep model for tracklet association, which can jointly learn the CNNs and temporally constrained metrics, is developed. Furthermore, a novel loss function incorporating temporally constrained multi-task learning mechanism is developed to make the deep model more effective in solving the tracklet association problem. Extensive experimental results comparing with the state-of-the-art methods demonstrate the effectiveness and superiority of the proposed unified deep model. DOCTOR OF PHILOSOPHY (EEE) 2016-01-19T01:44:26Z 2016-01-19T01:44:26Z 2016 Thesis Wang, B. (2016). Human detection and tracking in surveillance videos. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/65919 10.32657/10356/65919 en 155 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Wang, Bing
Human detection and tracking in surveillance videos
description The thesis addresses the following challenging problems of detecting and tracking humans in the presence of occlusions in typical surveillance videos: (1) adaptation of semantic-part-based human detectors to new surveillance video sequence when trained detectors using other video data not performing well on the new video data; (2) tracking of humans with person identification minimizing identification errors over longer tracking periods; and (3) hierarchical spatial and temporal analysis for discriminative tracking of human targets. The thesis aims to improve the state-of-the-art performance in human detection and tracking by studying the human detectors, extended tracking of track segments (tracklets) generated from short term tracking of detection responses. For the adaptation of semantic-part-based human detectors to new surveillance video sequence, a uni ed deep CNN model for joint learning of features, semantic pedestrian part detectors and a transfer learning model is developed. The components within this deep CNN model interact with each other in the learning process, which facilitates the optimization of the learned components during the co-operative learning. In particular, an adaptation layer is proposed to embed the capability of knowledge transfer into the CNN model. As a result, the proposed transferred CNN (T-CNN) model is able to transfer the visual knowledge of the semantic pedestrian parts from the source data to target data. Extensive experimental evaluations show that the proposed method is better than other deep learning based methods in terms of detection performance. Moreover, the adaptive deep features can be complementary to the pre-defined features used by other state-of-the-art methods. For tracking of humans with person identification minimizing identification errors over longer tracking periods, a novel method, based on online target-specific metric learning and coherent dynamics estimation, for tracklet association by network flow optimization is developed. The proposed framework aims to exploit appearance and motion cues to prevent identity switches during tracking and also to recover missed detections. The target-specific metrics (appearance cue) and motion dynamics (motion cue) are proposed to be learned and estimated online, i.e. during the tracking process. Furthermore, a learning algorithm to learn the weights of motion and appearance tracking cues for tracklet affinity models is proposed to handle some difficult situations. Extensive evaluations following state-of-the-art practices have been conducted and the results from these evaluations show the improvements by the proposed method over some existing state-of-the-art methods. In hierarchical spatial and temporal analysis for discriminative tracking of human targets, inspired by recent advances in convolutional neural network (CNN) architectures, a novel uni ed deep model for tracklet association, which can jointly learn the CNNs and temporally constrained metrics, is developed. Furthermore, a novel loss function incorporating temporally constrained multi-task learning mechanism is developed to make the deep model more effective in solving the tracklet association problem. Extensive experimental results comparing with the state-of-the-art methods demonstrate the effectiveness and superiority of the proposed unified deep model.
author2 Chan Kap Luk
author_facet Chan Kap Luk
Wang, Bing
format Theses and Dissertations
author Wang, Bing
author_sort Wang, Bing
title Human detection and tracking in surveillance videos
title_short Human detection and tracking in surveillance videos
title_full Human detection and tracking in surveillance videos
title_fullStr Human detection and tracking in surveillance videos
title_full_unstemmed Human detection and tracking in surveillance videos
title_sort human detection and tracking in surveillance videos
publishDate 2016
url https://hdl.handle.net/10356/65919
_version_ 1772828976283123712