Person re-identification using visual analytic and deep learning
Person re-identification is an important computer vision task. It can be used in the real-world applications where smart surveillance system is needed to track human movement across a network of video cameras, thus reducing vast amount of search time required when compared to manual screening of vid...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/155679 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Person re-identification is an important computer vision task. It can be used in the real-world applications where smart surveillance system is needed to track human movement across a network of video cameras, thus reducing vast amount of search time required when compared to manual screening of videos by human. In this thesis, we aim to address this fundamental and challenging computer vision task. In a nutshell, person re-identification system attempts to encode the person identity information into a representation from a probe image, and subsequently use this representation to compare with an array of gallery images' representation. Normally, the distances between the probe representation and the test representation are measured and sorted. Those images with their representation closest to the probe image are then retrieved and presented.
There are many challenges to the person re-identification system. For examples, pose variations, occlusions, ambient lighting changes, viewpoint differences between cameras, low image size and quality, etc. are some of the common problems that will affect the reliability of the vision task. In addition, different pedestrians with similar appearance, such as similar outfits and colors, may also confuse the vision system. Last but not least, the person re-identification dataset annotation is both costly and time consuming.
We address the above-mentioned problems by using deep neural networks, covering three different aspects of the person re-identification task. First, we design a strong model to encode the image into a discriminative feature vector, so that it can retrieves images from the same identity, but at the same time rejects those from the similar looking pedestrians. The model is also able to overcome the various challenges of occlusion, viewpoint changes, lighting differences, etc. Second, we enhance the model by incorporating human parsing, so that more precise learning of the features can be obtained. We also provide additional person attribute annotation to the two common large-scale datasets, so that more crucial and useful attributes can be queried. Third, we work on person re-identification domain adaptation so that the knowledge captured in our trained models from using the publicly available datasets can be transferred to a new dataset collected for new application, thus removing the need for expensive and time-consuming annotation. The following paragraphs elaborate further on our proposed three solutions.
In our first work, we propose the Attribute Attention Network (AANet). It is a multi-task deep neural network consisting of a global feature network, a part feature network and an attention feature network. These networks aim to learn pedestrian features at three different levels, which are at global level, local body regions and at attribute inferred feature map locations. We also generate various heatmaps from the person attribute and combine them into an attention map at global level to further enhance the feature learning to get a stronger representation. With the learning of the attributes, we can perform additional query to the gallery images. For example, we can first perform image retrieval using the probe image. After that, we fine-tune the retrieved results using attribute to filter out the challenging false positive images. In this proposed model, there are a total of four tasks. Their task weights are optimized by utilizing homoscedastic uncertainty learning. With this strong model, we outperform most of the state-of-the-arts in multiple datasets. In the second work, we propose the Attribute Parsing Network (APNet) which aims to integrate human parsing into our re-identification model so that accurate semantic segmentation of the human body can be provided to the feature extraction layers for learning. We also use the human parsing information to align the input images vertically so that the body part network is further strengthened. In addition, the human semantic segmentation helps to reduce background clutter and thus minimizing the over-fitting problem via background learning. Lastly, we provide additional important person attributes to the two commonly used large scale person re-identification datasets. These attributes include backpacks colors, clothing printed graphic pattern/logo, dual clothing colors, etc. These are important and useful attributes that allows better pedestrian differentiation. This proposed APNet outperforms most of the state-of-the-arts. In our third work, we propose the Collaborative Learning Mutual Network (CLM-Net), which is a person re-identification domain adaptation framework. The CLM-Net helps to transfer the knowledge gained from learning the published dataset (source domain) over to a new dataset (target domain) collected for a new application without the expensive and time-consuming annotation. CLM-Net is a collaborative network learning framework, and it consists of two student models and two teacher models. The student models' weights are updated via the back propagation, whereas the teacher models' weights are updated through exponential moving average from the student model. We obtain the pseudo labels for the target domain using DBSCAN algorithm, and we propose to utilize the unlabeled data via the contrastive learning. To enhance the discriminative power of the representation, we stack five local body part tasks and a saliency task onto the baseline global task. We test our method over three large datasets and obtains excellent results which outperforms the state-of-the-arts. In summary, we provide AANet, APNet and CLM-Net deep learning models to address the needs of person re-identification in three different aspects, which are 1) the feature extraction and attribute learning, 2) the integration of human parsing and attribute annotation, and finally 3) domain adaptation. We achieve excellent results by outperforming most of the state-of-the-arts on multiple large-scale datasets. |
---|