Camera domain transfer for video-based person re-identification
Person re-identification (Re-ID), also known as pedestrian re-identification, is a new technology that has emerged in the area of intelligent video analysis in recent years, and belongs to the category of image and video processing and analysis in complex environments. According to different researc...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/159290 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Person re-identification (Re-ID), also known as pedestrian re-identification, is a new technology that has emerged in the area of intelligent video analysis in recent years, and belongs to the category of image and video processing and analysis in complex environments. According to different research objects, person Re-ID can be divided into two types: image-based and video-based. The object of this dissertation is video-based person Re-ID, which is based on surveillance videos. Compared to methods use single frame image methods use surveillance videos has two advantages:1) more information is available, including the appearance, posture and movement of the person, 2) it is closer to the real-world application scenario, as the person in surveillance cameras are also presented in the form of video. However, due to the environment and the limitations of the equipment, there are instances where videos from the same identity are less similar than the video from the different identities. This is a common problem in cross-camera person Re-ID. Besides, feature learning based on deep learning methods is prone to overfitting on the relatively small scale video dataset.
To solve the problems mentioned above, a network based on StarGAN v2 is proposed in this dissertation. The proposed method, CaTSGAN, can address the video style variations problem caused by different cameras by transferring videos from one camera domain to others. In this way we can eliminate the video style variations. CaTSGAN can learn the relationship among the multiple camera domains with a single model and generate the cross-camera extra training samples for person Re-ID, which serves as a data augmentation approach that smooths the video style disparities and increase data diversity against overfitting. Compared to the baseline, the proposed method achieves a nearly 10-point improvement in the mean average precision (mAP), with Rank-5 and Rank-10 achieving 97.6% and 99.1% accuracy, respectively. This demonstrates the feasibility and improvement of the method. |
---|