Cross-scale generative adversarial network for crowd density estimation from images

This research develops a cross-scale convolutional spatial generative adversarial network (CSGAN), in order to estimate the crowd density from images accurately. It consists of two similar generators, one for the whole feature extraction, and the other for patch scale feature extraction. An encoder–...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhang, Gaowei, Pan, Yue, Zhang, Limao, Tiong, Robert Lee Kong
Other Authors: School of Civil and Environmental Engineering
Format: Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/161128
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This research develops a cross-scale convolutional spatial generative adversarial network (CSGAN), in order to estimate the crowd density from images accurately. It consists of two similar generators, one for the whole feature extraction, and the other for patch scale feature extraction. An encoder–decoder structure is employed to generate density maps from input images or patches. Additionally, a new objective function for crowd counting called cross-scale consistency pursuit containing an adversarial loss, L2 loss, perceptual loss, and consistency loss, is developed to make the generated density maps more realistic and closer to the ground truth. The effectiveness of the proposed CSGAN is verified in two public datasets. Results indicate that the new objective function is able to reach the most satisfying value of evaluation metrics in both the low-density and high-density crowd scenes when it is compared with other state-of-the-art methods on the test datasets. Moreover, the proposed CSGAN is more practical and flexible due to the smaller computational complexity. Its estimation capability will be significantly improved even in a small size of training data. Overall, this research contributes to the development of a novel computer vision approach together with a new objective function to generate density maps from cross-scale crowd images, enabling the counting process more accurately and efficiently.