Learning representations for human re-identification

This thesis addresses the problem of Human Re-Identification, the task of associating pedestrians over multiple camera views. Human re-identification particularly an inter- esting problem due to its applications in visual surveillance. Given a probe image of a subject, the objective is to identif...

Full description

Saved in:

Bibliographic Details
Main Author:	Varior, Rahul Rama
Other Authors:	Wang Gang
Format:	Theses and Dissertations
Language:	English
Published:	2017
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	http://hdl.handle.net/10356/70912
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-70912
record_format	dspace
spelling	sg-ntu-dr.10356-709122023-07-04T17:13:48Z Learning representations for human re-identification Varior, Rahul Rama Wang Gang Kot Chichung, Alex School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering This thesis addresses the problem of Human Re-Identification, the task of associating pedestrians over multiple camera views. Human re-identification particularly an inter- esting problem due to its applications in visual surveillance. Given a probe image of a subject, the objective is to identify a set of matching images of the same subject from a gallery set which are mostly captured by a different camera. Instead of manually search- ing through images captured by various cameras, it is desirable to automate the human re-identification as it can save enormous amount of manual labor. However, human re-identification is fundamentally a challenging problem due to cluttered backgrounds, ambiguity in visual appearance, variations in illumination, pose and view-point. The goal of this thesis is to present various feature learning architectures in different perspectives to tackle the aforementioned challenges in human re-identification. Public places are equipped with several thousands of surveillance cameras capturing videos round the clock. Since no biometric aspects such as fingerprint, GAIT or facial cues are accessible from the surveillance videos, visual appearance is the main cue for re- identifying pedestrians. Intuitively, to distinguish pedestrians from surveillance videos, color features can be an important aspect. However, varying illumination and environ- mental conditions pose a great challenge as the perceived color of the subject may vary. In existing researches, color features are used as it is, i.e. features are extracted from raw pixel values or weakly corrected pixels. In the first part of the thesis, an invariant color feature learning framework is presented to efficiently map and encode the weakly corrected pixel values in an invariant space where the representations of similar colors are close to each other. In the second part of the thesis, contextual information is incorporated into the local features. Conventional features are extracted locally and independent of other regions. However, such features lack the global context of the image. Therefore, it is desirable to incorporate the contextual information to the local features. In order to encode such information, a variant of the Recurrent Neural Network architecture called Long Short- Term Memory (LSTM) cells are used. The sophisticated gating mechanisms inside the LSTM cells has the flexibility to selectively propagate the relevant contextual information to the rest of the network. To eliminate the need for hand-crafted features, an end-to-end trainable Siamese Convolutional Neural Network (S-CNN) architecture is first proposed. However, in con- ventional S-CNN architectures, the representations of the images are compared only at the final stage when the feature representations mature. In this setting, the network is at risk of failing to capture and propagate subtle local patterns that can distinguish pos- itive pairs from hard-negative pairs. Therefore, a novel gating mechanism is modeled to selectively boost and propagate such common patterns from the middle layers to the final layers of the network. Extensive experimental evaluation and comparisons with baseline algorithms demonstrate the effectiveness of the proposed feature learning models. Doctor of Philosophy (EEE) 2017-05-12T03:58:55Z 2017-05-12T03:58:55Z 2017 Thesis Varior, R. R. (2017). Learning representations for human re-identification. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/70912 10.32657/10356/70912 en 170 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering Varior, Rahul Rama Learning representations for human re-identification
description	This thesis addresses the problem of Human Re-Identification, the task of associating pedestrians over multiple camera views. Human re-identification particularly an inter- esting problem due to its applications in visual surveillance. Given a probe image of a subject, the objective is to identify a set of matching images of the same subject from a gallery set which are mostly captured by a different camera. Instead of manually search- ing through images captured by various cameras, it is desirable to automate the human re-identification as it can save enormous amount of manual labor. However, human re-identification is fundamentally a challenging problem due to cluttered backgrounds, ambiguity in visual appearance, variations in illumination, pose and view-point. The goal of this thesis is to present various feature learning architectures in different perspectives to tackle the aforementioned challenges in human re-identification. Public places are equipped with several thousands of surveillance cameras capturing videos round the clock. Since no biometric aspects such as fingerprint, GAIT or facial cues are accessible from the surveillance videos, visual appearance is the main cue for re- identifying pedestrians. Intuitively, to distinguish pedestrians from surveillance videos, color features can be an important aspect. However, varying illumination and environ- mental conditions pose a great challenge as the perceived color of the subject may vary. In existing researches, color features are used as it is, i.e. features are extracted from raw pixel values or weakly corrected pixels. In the first part of the thesis, an invariant color feature learning framework is presented to efficiently map and encode the weakly corrected pixel values in an invariant space where the representations of similar colors are close to each other. In the second part of the thesis, contextual information is incorporated into the local features. Conventional features are extracted locally and independent of other regions. However, such features lack the global context of the image. Therefore, it is desirable to incorporate the contextual information to the local features. In order to encode such information, a variant of the Recurrent Neural Network architecture called Long Short- Term Memory (LSTM) cells are used. The sophisticated gating mechanisms inside the LSTM cells has the flexibility to selectively propagate the relevant contextual information to the rest of the network. To eliminate the need for hand-crafted features, an end-to-end trainable Siamese Convolutional Neural Network (S-CNN) architecture is first proposed. However, in con- ventional S-CNN architectures, the representations of the images are compared only at the final stage when the feature representations mature. In this setting, the network is at risk of failing to capture and propagate subtle local patterns that can distinguish pos- itive pairs from hard-negative pairs. Therefore, a novel gating mechanism is modeled to selectively boost and propagate such common patterns from the middle layers to the final layers of the network. Extensive experimental evaluation and comparisons with baseline algorithms demonstrate the effectiveness of the proposed feature learning models.
author2	Wang Gang
author_facet	Wang Gang Varior, Rahul Rama
format	Theses and Dissertations
author	Varior, Rahul Rama
author_sort	Varior, Rahul Rama
title	Learning representations for human re-identification
title_short	Learning representations for human re-identification
title_full	Learning representations for human re-identification
title_fullStr	Learning representations for human re-identification
title_full_unstemmed	Learning representations for human re-identification
title_sort	learning representations for human re-identification
publishDate	2017
url	http://hdl.handle.net/10356/70912
_version_	1772826948663246848

Learning representations for human re-identification

Similar Items