Human detection and tracking in surveillance videos

The thesis addresses the following challenging problems of detecting and tracking humans in the presence of occlusions in typical surveillance videos: (1) adaptation of semantic-part-based human detectors to new surveillance video sequence when trained detectors using other video data not performing...

Full description

Saved in:

Bibliographic Details
Main Author:	Wang, Bing
Other Authors:	Chan Kap Luk
Format:	Theses and Dissertations
Language:	English
Published:	2016
Subjects:	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Online Access:	https://hdl.handle.net/10356/65919
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-65919
record_format	dspace
spelling	sg-ntu-dr.10356-659192023-07-04T16:28:02Z Human detection and tracking in surveillance videos Wang, Bing Chan Kap Luk Wang Gang School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems The thesis addresses the following challenging problems of detecting and tracking humans in the presence of occlusions in typical surveillance videos: (1) adaptation of semantic-part-based human detectors to new surveillance video sequence when trained detectors using other video data not performing well on the new video data; (2) tracking of humans with person identification minimizing identification errors over longer tracking periods; and (3) hierarchical spatial and temporal analysis for discriminative tracking of human targets. The thesis aims to improve the state-of-the-art performance in human detection and tracking by studying the human detectors, extended tracking of track segments (tracklets) generated from short term tracking of detection responses. For the adaptation of semantic-part-based human detectors to new surveillance video sequence, a uni ed deep CNN model for joint learning of features, semantic pedestrian part detectors and a transfer learning model is developed. The components within this deep CNN model interact with each other in the learning process, which facilitates the optimization of the learned components during the co-operative learning. In particular, an adaptation layer is proposed to embed the capability of knowledge transfer into the CNN model. As a result, the proposed transferred CNN (T-CNN) model is able to transfer the visual knowledge of the semantic pedestrian parts from the source data to target data. Extensive experimental evaluations show that the proposed method is better than other deep learning based methods in terms of detection performance. Moreover, the adaptive deep features can be complementary to the pre-defined features used by other state-of-the-art methods. For tracking of humans with person identification minimizing identification errors over longer tracking periods, a novel method, based on online target-specific metric learning and coherent dynamics estimation, for tracklet association by network flow optimization is developed. The proposed framework aims to exploit appearance and motion cues to prevent identity switches during tracking and also to recover missed detections. The target-specific metrics (appearance cue) and motion dynamics (motion cue) are proposed to be learned and estimated online, i.e. during the tracking process. Furthermore, a learning algorithm to learn the weights of motion and appearance tracking cues for tracklet affinity models is proposed to handle some difficult situations. Extensive evaluations following state-of-the-art practices have been conducted and the results from these evaluations show the improvements by the proposed method over some existing state-of-the-art methods. In hierarchical spatial and temporal analysis for discriminative tracking of human targets, inspired by recent advances in convolutional neural network (CNN) architectures, a novel uni ed deep model for tracklet association, which can jointly learn the CNNs and temporally constrained metrics, is developed. Furthermore, a novel loss function incorporating temporally constrained multi-task learning mechanism is developed to make the deep model more effective in solving the tracklet association problem. Extensive experimental results comparing with the state-of-the-art methods demonstrate the effectiveness and superiority of the proposed unified deep model. DOCTOR OF PHILOSOPHY (EEE) 2016-01-19T01:44:26Z 2016-01-19T01:44:26Z 2016 Thesis Wang, B. (2016). Human detection and tracking in surveillance videos. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/65919 10.32657/10356/65919 en 155 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Wang, Bing Human detection and tracking in surveillance videos
description	The thesis addresses the following challenging problems of detecting and tracking humans in the presence of occlusions in typical surveillance videos: (1) adaptation of semantic-part-based human detectors to new surveillance video sequence when trained detectors using other video data not performing well on the new video data; (2) tracking of humans with person identification minimizing identification errors over longer tracking periods; and (3) hierarchical spatial and temporal analysis for discriminative tracking of human targets. The thesis aims to improve the state-of-the-art performance in human detection and tracking by studying the human detectors, extended tracking of track segments (tracklets) generated from short term tracking of detection responses. For the adaptation of semantic-part-based human detectors to new surveillance video sequence, a uni ed deep CNN model for joint learning of features, semantic pedestrian part detectors and a transfer learning model is developed. The components within this deep CNN model interact with each other in the learning process, which facilitates the optimization of the learned components during the co-operative learning. In particular, an adaptation layer is proposed to embed the capability of knowledge transfer into the CNN model. As a result, the proposed transferred CNN (T-CNN) model is able to transfer the visual knowledge of the semantic pedestrian parts from the source data to target data. Extensive experimental evaluations show that the proposed method is better than other deep learning based methods in terms of detection performance. Moreover, the adaptive deep features can be complementary to the pre-defined features used by other state-of-the-art methods. For tracking of humans with person identification minimizing identification errors over longer tracking periods, a novel method, based on online target-specific metric learning and coherent dynamics estimation, for tracklet association by network flow optimization is developed. The proposed framework aims to exploit appearance and motion cues to prevent identity switches during tracking and also to recover missed detections. The target-specific metrics (appearance cue) and motion dynamics (motion cue) are proposed to be learned and estimated online, i.e. during the tracking process. Furthermore, a learning algorithm to learn the weights of motion and appearance tracking cues for tracklet affinity models is proposed to handle some difficult situations. Extensive evaluations following state-of-the-art practices have been conducted and the results from these evaluations show the improvements by the proposed method over some existing state-of-the-art methods. In hierarchical spatial and temporal analysis for discriminative tracking of human targets, inspired by recent advances in convolutional neural network (CNN) architectures, a novel uni ed deep model for tracklet association, which can jointly learn the CNNs and temporally constrained metrics, is developed. Furthermore, a novel loss function incorporating temporally constrained multi-task learning mechanism is developed to make the deep model more effective in solving the tracklet association problem. Extensive experimental results comparing with the state-of-the-art methods demonstrate the effectiveness and superiority of the proposed unified deep model.
author2	Chan Kap Luk
author_facet	Chan Kap Luk Wang, Bing
format	Theses and Dissertations
author	Wang, Bing
author_sort	Wang, Bing
title	Human detection and tracking in surveillance videos
title_short	Human detection and tracking in surveillance videos
title_full	Human detection and tracking in surveillance videos
title_fullStr	Human detection and tracking in surveillance videos
title_full_unstemmed	Human detection and tracking in surveillance videos
title_sort	human detection and tracking in surveillance videos
publishDate	2016
url	https://hdl.handle.net/10356/65919
_version_	1772828976283123712

Human detection and tracking in surveillance videos

Similar Items