Physical Distancing and Mask Wearing Behavior Dataset Generator from CCTV Footages Using YOLOv8

Computer simulations using agent-based approach aimed at modeling human behavior require a robust dataset derived from actual observation to serve as ground truth. This paper details an approach for developing a movement behavior dataset generator from CCTV footages with respect to two health-relate...

Full description

Saved in:
Bibliographic Details
Main Authors: Abao, Roland P., Estuar, Ma. Regina Justina, Abu, Patricia Angela R
Format: text
Published: Archīum Ateneo 2023
Subjects:
Online Access:https://archium.ateneo.edu/discs-faculty-pubs/393
https://doi.org/10.1007/978-3-031-43129-6_29
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
Description
Summary:Computer simulations using agent-based approach aimed at modeling human behavior require a robust dataset derived from actual observation to serve as ground truth. This paper details an approach for developing a movement behavior dataset generator from CCTV footages with respect to two health-related behaviors: face mask wearing and physical distancing, while addressing the privacy concerns of confidential CCTV data. A two-stage YOLOv8-based cascaded approach was implemented for object tracking and detection. The first stage involves tracking of individuals in the video feed to determine physical distancing behavior using the pre-trained YOLOv8 xLarge model paired with Bot-SORT multi-object tracker and OpenCV Perspective-n-Point pose estimation. The second stage involves determining the mask wearing behavior of the tracked individuals using the best-performing model among the five YOLOv8 models (nano, small, medium, large, and xLarge), each trained for 50 epochs on a custom CCTV dataset. Results show that the custom-trained xLarge model performed the best on the mask detection task with the following metric scores: mAP50 = 0.94; mAP50-95 = 0.63; and F1 = 0.872. The faces of all the tracked individuals are blurred-out in the resulting video frames to preserve the privacy of the CCTV data. Finally, the developed system is able to generate the corresponding mask-distancing behavior dataset and annotated output videos from the input CCTV raw footages.