Camera domain transfer for video-based person re-identification

Person re-identification (Re-ID), also known as pedestrian re-identification, is a new technology that has emerged in the area of intelligent video analysis in recent years, and belongs to the category of image and video processing and analysis in complex environments. According to different researc...

Full description

Saved in:
Bibliographic Details
Main Author: Ding, Bangjie
Other Authors: Tan Yap Peng
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/159290
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Person re-identification (Re-ID), also known as pedestrian re-identification, is a new technology that has emerged in the area of intelligent video analysis in recent years, and belongs to the category of image and video processing and analysis in complex environments. According to different research objects, person Re-ID can be divided into two types: image-based and video-based. The object of this dissertation is video-based person Re-ID, which is based on surveillance videos. Compared to methods use single frame image methods use surveillance videos has two advantages:1) more information is available, including the appearance, posture and movement of the person, 2) it is closer to the real-world application scenario, as the person in surveillance cameras are also presented in the form of video. However, due to the environment and the limitations of the equipment, there are instances where videos from the same identity are less similar than the video from the different identities. This is a common problem in cross-camera person Re-ID. Besides, feature learning based on deep learning methods is prone to overfitting on the relatively small scale video dataset. To solve the problems mentioned above, a network based on StarGAN v2 is proposed in this dissertation. The proposed method, CaTSGAN, can address the video style variations problem caused by different cameras by transferring videos from one camera domain to others. In this way we can eliminate the video style variations. CaTSGAN can learn the relationship among the multiple camera domains with a single model and generate the cross-camera extra training samples for person Re-ID, which serves as a data augmentation approach that smooths the video style disparities and increase data diversity against overfitting. Compared to the baseline, the proposed method achieves a nearly 10-point improvement in the mean average precision (mAP), with Rank-5 and Rank-10 achieving 97.6% and 99.1% accuracy, respectively. This demonstrates the feasibility and improvement of the method.