Learning representations for human re-identification

This thesis addresses the problem of Human Re-Identification, the task of associating pedestrians over multiple camera views. Human re-identification particularly an inter- esting problem due to its applications in visual surveillance. Given a probe image of a subject, the objective is to identif...

Full description

Saved in:
Bibliographic Details
Main Author: Varior, Rahul Rama
Other Authors: Wang Gang
Format: Theses and Dissertations
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/70912
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-70912
record_format dspace
spelling sg-ntu-dr.10356-709122023-07-04T17:13:48Z Learning representations for human re-identification Varior, Rahul Rama Wang Gang Kot Chichung, Alex School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering This thesis addresses the problem of Human Re-Identification, the task of associating pedestrians over multiple camera views. Human re-identification particularly an inter- esting problem due to its applications in visual surveillance. Given a probe image of a subject, the objective is to identify a set of matching images of the same subject from a gallery set which are mostly captured by a different camera. Instead of manually search- ing through images captured by various cameras, it is desirable to automate the human re-identification as it can save enormous amount of manual labor. However, human re-identification is fundamentally a challenging problem due to cluttered backgrounds, ambiguity in visual appearance, variations in illumination, pose and view-point. The goal of this thesis is to present various feature learning architectures in different perspectives to tackle the aforementioned challenges in human re-identification. Public places are equipped with several thousands of surveillance cameras capturing videos round the clock. Since no biometric aspects such as fingerprint, GAIT or facial cues are accessible from the surveillance videos, visual appearance is the main cue for re- identifying pedestrians. Intuitively, to distinguish pedestrians from surveillance videos, color features can be an important aspect. However, varying illumination and environ- mental conditions pose a great challenge as the perceived color of the subject may vary. In existing researches, color features are used as it is, i.e. features are extracted from raw pixel values or weakly corrected pixels. In the first part of the thesis, an invariant color feature learning framework is presented to efficiently map and encode the weakly corrected pixel values in an invariant space where the representations of similar colors are close to each other. In the second part of the thesis, contextual information is incorporated into the local features. Conventional features are extracted locally and independent of other regions. However, such features lack the global context of the image. Therefore, it is desirable to incorporate the contextual information to the local features. In order to encode such information, a variant of the Recurrent Neural Network architecture called Long Short- Term Memory (LSTM) cells are used. The sophisticated gating mechanisms inside the LSTM cells has the flexibility to selectively propagate the relevant contextual information to the rest of the network. To eliminate the need for hand-crafted features, an end-to-end trainable Siamese Convolutional Neural Network (S-CNN) architecture is first proposed. However, in con- ventional S-CNN architectures, the representations of the images are compared only at the final stage when the feature representations mature. In this setting, the network is at risk of failing to capture and propagate subtle local patterns that can distinguish pos- itive pairs from hard-negative pairs. Therefore, a novel gating mechanism is modeled to selectively boost and propagate such common patterns from the middle layers to the final layers of the network. Extensive experimental evaluation and comparisons with baseline algorithms demonstrate the effectiveness of the proposed feature learning models. Doctor of Philosophy (EEE) 2017-05-12T03:58:55Z 2017-05-12T03:58:55Z 2017 Thesis Varior, R. R. (2017). Learning representations for human re-identification. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/70912 10.32657/10356/70912 en 170 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Varior, Rahul Rama
Learning representations for human re-identification
description This thesis addresses the problem of Human Re-Identification, the task of associating pedestrians over multiple camera views. Human re-identification particularly an inter- esting problem due to its applications in visual surveillance. Given a probe image of a subject, the objective is to identify a set of matching images of the same subject from a gallery set which are mostly captured by a different camera. Instead of manually search- ing through images captured by various cameras, it is desirable to automate the human re-identification as it can save enormous amount of manual labor. However, human re-identification is fundamentally a challenging problem due to cluttered backgrounds, ambiguity in visual appearance, variations in illumination, pose and view-point. The goal of this thesis is to present various feature learning architectures in different perspectives to tackle the aforementioned challenges in human re-identification. Public places are equipped with several thousands of surveillance cameras capturing videos round the clock. Since no biometric aspects such as fingerprint, GAIT or facial cues are accessible from the surveillance videos, visual appearance is the main cue for re- identifying pedestrians. Intuitively, to distinguish pedestrians from surveillance videos, color features can be an important aspect. However, varying illumination and environ- mental conditions pose a great challenge as the perceived color of the subject may vary. In existing researches, color features are used as it is, i.e. features are extracted from raw pixel values or weakly corrected pixels. In the first part of the thesis, an invariant color feature learning framework is presented to efficiently map and encode the weakly corrected pixel values in an invariant space where the representations of similar colors are close to each other. In the second part of the thesis, contextual information is incorporated into the local features. Conventional features are extracted locally and independent of other regions. However, such features lack the global context of the image. Therefore, it is desirable to incorporate the contextual information to the local features. In order to encode such information, a variant of the Recurrent Neural Network architecture called Long Short- Term Memory (LSTM) cells are used. The sophisticated gating mechanisms inside the LSTM cells has the flexibility to selectively propagate the relevant contextual information to the rest of the network. To eliminate the need for hand-crafted features, an end-to-end trainable Siamese Convolutional Neural Network (S-CNN) architecture is first proposed. However, in con- ventional S-CNN architectures, the representations of the images are compared only at the final stage when the feature representations mature. In this setting, the network is at risk of failing to capture and propagate subtle local patterns that can distinguish pos- itive pairs from hard-negative pairs. Therefore, a novel gating mechanism is modeled to selectively boost and propagate such common patterns from the middle layers to the final layers of the network. Extensive experimental evaluation and comparisons with baseline algorithms demonstrate the effectiveness of the proposed feature learning models.
author2 Wang Gang
author_facet Wang Gang
Varior, Rahul Rama
format Theses and Dissertations
author Varior, Rahul Rama
author_sort Varior, Rahul Rama
title Learning representations for human re-identification
title_short Learning representations for human re-identification
title_full Learning representations for human re-identification
title_fullStr Learning representations for human re-identification
title_full_unstemmed Learning representations for human re-identification
title_sort learning representations for human re-identification
publishDate 2017
url http://hdl.handle.net/10356/70912
_version_ 1772826948663246848