Video summarization of person of interest (POI)

With the increase of video content available, there is a greater need for data management of these digital media. Video summarization aims to create a succinct and comprehensive synopsis through the selection of key details from video media [1]. Most video summarization models summarize the entire v...

Full description

Saved in:
Bibliographic Details
Main Author: Lit, Laura Pei Lin
Other Authors: Lee Bu Sung, Francis
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156530
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-156530
record_format dspace
spelling sg-ntu-dr.10356-1565302022-04-19T07:33:42Z Video summarization of person of interest (POI) Lit, Laura Pei Lin Lee Bu Sung, Francis School of Computer Science and Engineering Home Team Science and Technology Agency (HTX) EBSLEE@ntu.edu.sg Engineering::Computer science and engineering With the increase of video content available, there is a greater need for data management of these digital media. Video summarization aims to create a succinct and comprehensive synopsis through the selection of key details from video media [1]. Most video summarization models summarize the entire video without any prior trimming of key details which may lead to excess information being provided. Conventional video summarization models only provide one summarized statement for the entire video, which often results in a very broad description of activities that happened in the video. The first contribution of this project is a new enhanced video summarization model which provides additional information centered around a particular Person of interest (POI). A new pipeline is developed for video summarization of POI using deep-learning based methods, providing further insights on POI face, action and clothing. Face recognition and detection are first used to identify the POI within the video.When the identity of the POI is identified using face recognition, clothes descriptors are applied on the POI to identify what clothing they are wearing. Finally, the video is trimmed to only include parts containing the POI for more precise video summarization, accurately deriving the key activities the POI is involved in. Multiple state-of-the-art face detection, mask classification and face recognition models have been explored and integrated into the new pipeline to achieve this goal. Convolutional Neural Networks (CNN), such as Resnet 50, are used for classification and Multi-Task Cascaded Convolutional Network (MTCNN) are used for face recognition, while object detection model You Only Look Once (YOLO) is used for human extraction. K-means clustering is used for color extraction of POI clothes. The second contribution is the enhancement of the accuracy of the various individual component’s ability to extract and classify the various objects, and thus justify the selections made for the pipeline. The use of Face Detection using DLIB has achieve an accuracy of 88.2%, however, the enhanced facial recognition model, which includes the use of both Multi-Task Cascaded Convolutional Network (MTCNN) and Face Detection using DLIB, achieved an accuracy of 94.9%, a 6% increased in overall accuracy. While the mask classification model trained using ResNet 50 achieved an accuracy of 98.11%. An overall evaluation of the model and its use cases conclude the report, with possible further expansions such as real-time video detection and optimisation of descriptors. Keywords: Convolutional Neural Network, Face Detection, Face Recognition, Object detection, Video Summarization, You Only Look Once Bachelor of Science in Data Science and Artificial Intelligence 2022-04-19T07:33:41Z 2022-04-19T07:33:41Z 2021 Final Year Project (FYP) Lit, L. P. L. (2021). Video summarization of person of interest (POI). Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156530 https://hdl.handle.net/10356/156530 en SCSE21-0546 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Lit, Laura Pei Lin
Video summarization of person of interest (POI)
description With the increase of video content available, there is a greater need for data management of these digital media. Video summarization aims to create a succinct and comprehensive synopsis through the selection of key details from video media [1]. Most video summarization models summarize the entire video without any prior trimming of key details which may lead to excess information being provided. Conventional video summarization models only provide one summarized statement for the entire video, which often results in a very broad description of activities that happened in the video. The first contribution of this project is a new enhanced video summarization model which provides additional information centered around a particular Person of interest (POI). A new pipeline is developed for video summarization of POI using deep-learning based methods, providing further insights on POI face, action and clothing. Face recognition and detection are first used to identify the POI within the video.When the identity of the POI is identified using face recognition, clothes descriptors are applied on the POI to identify what clothing they are wearing. Finally, the video is trimmed to only include parts containing the POI for more precise video summarization, accurately deriving the key activities the POI is involved in. Multiple state-of-the-art face detection, mask classification and face recognition models have been explored and integrated into the new pipeline to achieve this goal. Convolutional Neural Networks (CNN), such as Resnet 50, are used for classification and Multi-Task Cascaded Convolutional Network (MTCNN) are used for face recognition, while object detection model You Only Look Once (YOLO) is used for human extraction. K-means clustering is used for color extraction of POI clothes. The second contribution is the enhancement of the accuracy of the various individual component’s ability to extract and classify the various objects, and thus justify the selections made for the pipeline. The use of Face Detection using DLIB has achieve an accuracy of 88.2%, however, the enhanced facial recognition model, which includes the use of both Multi-Task Cascaded Convolutional Network (MTCNN) and Face Detection using DLIB, achieved an accuracy of 94.9%, a 6% increased in overall accuracy. While the mask classification model trained using ResNet 50 achieved an accuracy of 98.11%. An overall evaluation of the model and its use cases conclude the report, with possible further expansions such as real-time video detection and optimisation of descriptors. Keywords: Convolutional Neural Network, Face Detection, Face Recognition, Object detection, Video Summarization, You Only Look Once
author2 Lee Bu Sung, Francis
author_facet Lee Bu Sung, Francis
Lit, Laura Pei Lin
format Final Year Project
author Lit, Laura Pei Lin
author_sort Lit, Laura Pei Lin
title Video summarization of person of interest (POI)
title_short Video summarization of person of interest (POI)
title_full Video summarization of person of interest (POI)
title_fullStr Video summarization of person of interest (POI)
title_full_unstemmed Video summarization of person of interest (POI)
title_sort video summarization of person of interest (poi)
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/156530
_version_ 1731235803970928640