Learning representations for human re-identification
This thesis addresses the problem of Human Re-Identification, the task of associating pedestrians over multiple camera views. Human re-identification particularly an inter- esting problem due to its applications in visual surveillance. Given a probe image of a subject, the objective is to identif...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/70912 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-70912 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-709122023-07-04T17:13:48Z Learning representations for human re-identification Varior, Rahul Rama Wang Gang Kot Chichung, Alex School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering This thesis addresses the problem of Human Re-Identification, the task of associating pedestrians over multiple camera views. Human re-identification particularly an inter- esting problem due to its applications in visual surveillance. Given a probe image of a subject, the objective is to identify a set of matching images of the same subject from a gallery set which are mostly captured by a different camera. Instead of manually search- ing through images captured by various cameras, it is desirable to automate the human re-identification as it can save enormous amount of manual labor. However, human re-identification is fundamentally a challenging problem due to cluttered backgrounds, ambiguity in visual appearance, variations in illumination, pose and view-point. The goal of this thesis is to present various feature learning architectures in different perspectives to tackle the aforementioned challenges in human re-identification. Public places are equipped with several thousands of surveillance cameras capturing videos round the clock. Since no biometric aspects such as fingerprint, GAIT or facial cues are accessible from the surveillance videos, visual appearance is the main cue for re- identifying pedestrians. Intuitively, to distinguish pedestrians from surveillance videos, color features can be an important aspect. However, varying illumination and environ- mental conditions pose a great challenge as the perceived color of the subject may vary. In existing researches, color features are used as it is, i.e. features are extracted from raw pixel values or weakly corrected pixels. In the first part of the thesis, an invariant color feature learning framework is presented to efficiently map and encode the weakly corrected pixel values in an invariant space where the representations of similar colors are close to each other. In the second part of the thesis, contextual information is incorporated into the local features. Conventional features are extracted locally and independent of other regions. However, such features lack the global context of the image. Therefore, it is desirable to incorporate the contextual information to the local features. In order to encode such information, a variant of the Recurrent Neural Network architecture called Long Short- Term Memory (LSTM) cells are used. The sophisticated gating mechanisms inside the LSTM cells has the flexibility to selectively propagate the relevant contextual information to the rest of the network. To eliminate the need for hand-crafted features, an end-to-end trainable Siamese Convolutional Neural Network (S-CNN) architecture is first proposed. However, in con- ventional S-CNN architectures, the representations of the images are compared only at the final stage when the feature representations mature. In this setting, the network is at risk of failing to capture and propagate subtle local patterns that can distinguish pos- itive pairs from hard-negative pairs. Therefore, a novel gating mechanism is modeled to selectively boost and propagate such common patterns from the middle layers to the final layers of the network. Extensive experimental evaluation and comparisons with baseline algorithms demonstrate the effectiveness of the proposed feature learning models. Doctor of Philosophy (EEE) 2017-05-12T03:58:55Z 2017-05-12T03:58:55Z 2017 Thesis Varior, R. R. (2017). Learning representations for human re-identification. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/70912 10.32657/10356/70912 en 170 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering Varior, Rahul Rama Learning representations for human re-identification |
description |
This thesis addresses the problem of Human Re-Identification, the task of associating
pedestrians over multiple camera views. Human re-identification particularly an inter-
esting problem due to its applications in visual surveillance. Given a probe image of a
subject, the objective is to identify a set of matching images of the same subject from a
gallery set which are mostly captured by a different camera. Instead of manually search-
ing through images captured by various cameras, it is desirable to automate the human
re-identification as it can save enormous amount of manual labor. However, human
re-identification is fundamentally a challenging problem due to cluttered backgrounds,
ambiguity in visual appearance, variations in illumination, pose and view-point. The goal
of this thesis is to present various feature learning architectures in different perspectives
to tackle the aforementioned challenges in human re-identification.
Public places are equipped with several thousands of surveillance cameras capturing
videos round the clock. Since no biometric aspects such as fingerprint, GAIT or facial
cues are accessible from the surveillance videos, visual appearance is the main cue for re-
identifying pedestrians. Intuitively, to distinguish pedestrians from surveillance videos,
color features can be an important aspect. However, varying illumination and environ-
mental conditions pose a great challenge as the perceived color of the subject may vary.
In existing researches, color features are used as it is, i.e. features are extracted from
raw pixel values or weakly corrected pixels. In the first part of the thesis, an invariant
color feature learning framework is presented to efficiently map and encode the weakly
corrected pixel values in an invariant space where the representations of similar colors
are close to each other.
In the second part of the thesis, contextual information is incorporated into the local
features. Conventional features are extracted locally and independent of other regions.
However, such features lack the global context of the image. Therefore, it is desirable
to incorporate the contextual information to the local features. In order to encode such
information, a variant of the Recurrent Neural Network architecture called Long Short-
Term Memory (LSTM) cells are used. The sophisticated gating mechanisms inside the
LSTM cells has the flexibility to selectively propagate the relevant contextual information
to the rest of the network.
To eliminate the need for hand-crafted features, an end-to-end trainable Siamese
Convolutional Neural Network (S-CNN) architecture is first proposed. However, in con-
ventional S-CNN architectures, the representations of the images are compared only at
the final stage when the feature representations mature. In this setting, the network is
at risk of failing to capture and propagate subtle local patterns that can distinguish pos-
itive pairs from hard-negative pairs. Therefore, a novel gating mechanism is modeled to
selectively boost and propagate such common patterns from the middle layers to the final
layers of the network. Extensive experimental evaluation and comparisons with baseline
algorithms demonstrate the effectiveness of the proposed feature learning models. |
author2 |
Wang Gang |
author_facet |
Wang Gang Varior, Rahul Rama |
format |
Theses and Dissertations |
author |
Varior, Rahul Rama |
author_sort |
Varior, Rahul Rama |
title |
Learning representations for human re-identification |
title_short |
Learning representations for human re-identification |
title_full |
Learning representations for human re-identification |
title_fullStr |
Learning representations for human re-identification |
title_full_unstemmed |
Learning representations for human re-identification |
title_sort |
learning representations for human re-identification |
publishDate |
2017 |
url |
http://hdl.handle.net/10356/70912 |
_version_ |
1772826948663246848 |