CARF-net : CNN attention and RNN fusion network for video-based person reidentification

Video-based person reidentification is a challenging and important task in surveillance-based applications. Toward this, several shallow and deep networks have been proposed. However, the performance of existing shallow networks does not generalize well on large datasets. To improve the generalizati...

Full description

Saved in:
Bibliographic Details
Main Authors: Prasad, Dilip Kumar, Kansal, Kajal, Venkata, Subramanyam, Kankanhalli, Mohan
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/105466
http://hdl.handle.net/10220/48712
http://dx.doi.org/10.1117/1.JEI.28.2.023036
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Video-based person reidentification is a challenging and important task in surveillance-based applications. Toward this, several shallow and deep networks have been proposed. However, the performance of existing shallow networks does not generalize well on large datasets. To improve the generalization ability, we propose a shallow end-to-end network which incorporates two stream convolutional neural networks, discriminative visual attention and recurrent neural network with triplet and softmax loss to learn the spatiotemporal fusion features. To effectively use both spatial and temporal information, we apply spatial, temporal, and spatiotemporal pooling. In addition, we contribute a large dataset of airborne videos for person reidentification, named DJI01. It includes various challenging conditions, such as occlusion, illuminationchanges, people with similar clothes, and the same people on different days. We perform elaborate qualitative and quantitative analyses to demonstrate the robust performance of the proposed model.