Video summarization of person of interest (POI)

With the increase of video content available, there is a greater need for data management of these digital media. Video summarization aims to create a succinct and comprehensive synopsis through the selection of key details from video media [1]. Most video summarization models summarize the entire v...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Lit, Laura Pei Lin
مؤلفون آخرون:	Lee Bu Sung, Francis
التنسيق:	Final Year Project
اللغة:	English
منشور في:	Nanyang Technological University 2022
الموضوعات:	Engineering::Computer science and engineering
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/156530
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-156530
record_format	dspace
spelling	sg-ntu-dr.10356-1565302022-04-19T07:33:42Z Video summarization of person of interest (POI) Lit, Laura Pei Lin Lee Bu Sung, Francis School of Computer Science and Engineering Home Team Science and Technology Agency (HTX) EBSLEE@ntu.edu.sg Engineering::Computer science and engineering With the increase of video content available, there is a greater need for data management of these digital media. Video summarization aims to create a succinct and comprehensive synopsis through the selection of key details from video media [1]. Most video summarization models summarize the entire video without any prior trimming of key details which may lead to excess information being provided. Conventional video summarization models only provide one summarized statement for the entire video, which often results in a very broad description of activities that happened in the video. The first contribution of this project is a new enhanced video summarization model which provides additional information centered around a particular Person of interest (POI). A new pipeline is developed for video summarization of POI using deep-learning based methods, providing further insights on POI face, action and clothing. Face recognition and detection are first used to identify the POI within the video.When the identity of the POI is identified using face recognition, clothes descriptors are applied on the POI to identify what clothing they are wearing. Finally, the video is trimmed to only include parts containing the POI for more precise video summarization, accurately deriving the key activities the POI is involved in. Multiple state-of-the-art face detection, mask classification and face recognition models have been explored and integrated into the new pipeline to achieve this goal. Convolutional Neural Networks (CNN), such as Resnet 50, are used for classification and Multi-Task Cascaded Convolutional Network (MTCNN) are used for face recognition, while object detection model You Only Look Once (YOLO) is used for human extraction. K-means clustering is used for color extraction of POI clothes. The second contribution is the enhancement of the accuracy of the various individual component’s ability to extract and classify the various objects, and thus justify the selections made for the pipeline. The use of Face Detection using DLIB has achieve an accuracy of 88.2%, however, the enhanced facial recognition model, which includes the use of both Multi-Task Cascaded Convolutional Network (MTCNN) and Face Detection using DLIB, achieved an accuracy of 94.9%, a 6% increased in overall accuracy. While the mask classification model trained using ResNet 50 achieved an accuracy of 98.11%. An overall evaluation of the model and its use cases conclude the report, with possible further expansions such as real-time video detection and optimisation of descriptors. Keywords: Convolutional Neural Network, Face Detection, Face Recognition, Object detection, Video Summarization, You Only Look Once Bachelor of Science in Data Science and Artificial Intelligence 2022-04-19T07:33:41Z 2022-04-19T07:33:41Z 2021 Final Year Project (FYP) Lit, L. P. L. (2021). Video summarization of person of interest (POI). Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156530 https://hdl.handle.net/10356/156530 en SCSE21-0546 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Lit, Laura Pei Lin Video summarization of person of interest (POI)
description	With the increase of video content available, there is a greater need for data management of these digital media. Video summarization aims to create a succinct and comprehensive synopsis through the selection of key details from video media [1]. Most video summarization models summarize the entire video without any prior trimming of key details which may lead to excess information being provided. Conventional video summarization models only provide one summarized statement for the entire video, which often results in a very broad description of activities that happened in the video. The first contribution of this project is a new enhanced video summarization model which provides additional information centered around a particular Person of interest (POI). A new pipeline is developed for video summarization of POI using deep-learning based methods, providing further insights on POI face, action and clothing. Face recognition and detection are first used to identify the POI within the video.When the identity of the POI is identified using face recognition, clothes descriptors are applied on the POI to identify what clothing they are wearing. Finally, the video is trimmed to only include parts containing the POI for more precise video summarization, accurately deriving the key activities the POI is involved in. Multiple state-of-the-art face detection, mask classification and face recognition models have been explored and integrated into the new pipeline to achieve this goal. Convolutional Neural Networks (CNN), such as Resnet 50, are used for classification and Multi-Task Cascaded Convolutional Network (MTCNN) are used for face recognition, while object detection model You Only Look Once (YOLO) is used for human extraction. K-means clustering is used for color extraction of POI clothes. The second contribution is the enhancement of the accuracy of the various individual component’s ability to extract and classify the various objects, and thus justify the selections made for the pipeline. The use of Face Detection using DLIB has achieve an accuracy of 88.2%, however, the enhanced facial recognition model, which includes the use of both Multi-Task Cascaded Convolutional Network (MTCNN) and Face Detection using DLIB, achieved an accuracy of 94.9%, a 6% increased in overall accuracy. While the mask classification model trained using ResNet 50 achieved an accuracy of 98.11%. An overall evaluation of the model and its use cases conclude the report, with possible further expansions such as real-time video detection and optimisation of descriptors. Keywords: Convolutional Neural Network, Face Detection, Face Recognition, Object detection, Video Summarization, You Only Look Once
author2	Lee Bu Sung, Francis
author_facet	Lee Bu Sung, Francis Lit, Laura Pei Lin
format	Final Year Project
author	Lit, Laura Pei Lin
author_sort	Lit, Laura Pei Lin
title	Video summarization of person of interest (POI)
title_short	Video summarization of person of interest (POI)
title_full	Video summarization of person of interest (POI)
title_fullStr	Video summarization of person of interest (POI)
title_full_unstemmed	Video summarization of person of interest (POI)
title_sort	video summarization of person of interest (poi)
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/156530
_version_	1731235803970928640

Video summarization of person of interest (POI)

مواد مشابهة