Crowd counting for intelligent video surveillance

Surveillance plays an important role in maintaining public safety. Especially under the situation of COVID-19 recently, the flow of people needs to be monitored and strictly controlled at any time. However, this work usually costs plenty of time for humans to observe. Meanwhile, it is difficult to m...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Pengyu
Other Authors: Lap-Pui Chau
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/154607
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Surveillance plays an important role in maintaining public safety. Especially under the situation of COVID-19 recently, the flow of people needs to be monitored and strictly controlled at any time. However, this work usually costs plenty of time for humans to observe. Meanwhile, it is difficult to make an accurate estimation for crowds, especially in complex scenes. Fortunately, machine vision is an advanced technology that can help us complete this time-consuming task. With the rise of convolutional neural networks and deep learning, visual detectors can distinguish more types of objects, and they also have a wider range of applications. Meanwhile, the performance of these detectors has gradually improved, making it possible to use surveillance cameras to complete crowd detection tasks simultaneously. The video can be processed frame-by-frame as an image, and then the detector can automatically output prediction data, such as the total number of people, their faces’ locations and sizes, etc. In this dissertation, several object detection methods and the basic principles of the convolutional neural network are briefly introduced as fundamental knowledge. Besides, a simple and effective network with some modifications is discussed as the baseline of our method. Meanwhile, a self-training approach that enables the network to be trained using only point-level annotations is also introduced. Our method proposes to combine this training approach with the baseline to benefit from their powerful error correction and crowd analysis capabilities. Experimental results on the NWPU dataset show that our method is effective in crowd counting, crowd localization, and size prediction tasks.