Multi-view human action recognition in meeting scenarios

Due to the continuous development in deep learning and computer vision, the recognition of human actions has become one of the most popular research topics. Various methods have been proposed to tackle this problem. This project implements a Multi-View Human Action Recognition System with focus on s...

Full description

Saved in:
Bibliographic Details
Main Author: Yin, Haixiang
Other Authors: Tan Yap Peng
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/153357
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-153357
record_format dspace
spelling sg-ntu-dr.10356-1533572023-07-04T16:24:59Z Multi-view human action recognition in meeting scenarios Yin, Haixiang Tan Yap Peng School of Electrical and Electronic Engineering EYPTan@ntu.edu.sg Engineering::Electrical and electronic engineering Due to the continuous development in deep learning and computer vision, the recognition of human actions has become one of the most popular research topics. Various methods have been proposed to tackle this problem. This project implements a Multi-View Human Action Recognition System with focus on spatio-temporal localization of actions in a meeting scenario. Existing human action recognition systems tend to face the problem of human-to-human or human-to-object occlusion in some cases. This can greatly affect the recognition accuracy. Most of the existing multi-view action recognition systems also do not focus on the spatio-temporal localization of actions. However, the problem of occlusion in meeting scenarios is a frequent phenomenon. Once it occurs, it can persist for a long time. Hence, existing methods and datasets do not work well in this scenario. This project aims to address the above limitations. We first process a multi-view meeting dataset, AMI (Augmented Multi-party Interaction) meeting corpus. To make it can be used for multi-view action recognition. In addition, we use SlowFast Network as a backbone network for action recognition and use Torchreid (A library for deep learning person re-identification in PyTorch) for instance association after learning the features of the input from different camera viewpoints. And finally, the system uses the method of late fusion to fuse the information from the left and right viewpoints into the center viewpoint that has occlusion problem. This method will improve the system's ability to deal with the occlusion problem. The method proposed in this project can improve by up to nearly 10 percent of the mAP (Mean Average Precision) on AMI meeting corpus compared to single-view recognition approaches. Master of Science (Signal Processing) 2021-11-24T05:20:14Z 2021-11-24T05:20:14Z 2021 Thesis-Master by Coursework Yin, H. (2021). Multi-view human action recognition in meeting scenarios. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/153357 https://hdl.handle.net/10356/153357 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Yin, Haixiang
Multi-view human action recognition in meeting scenarios
description Due to the continuous development in deep learning and computer vision, the recognition of human actions has become one of the most popular research topics. Various methods have been proposed to tackle this problem. This project implements a Multi-View Human Action Recognition System with focus on spatio-temporal localization of actions in a meeting scenario. Existing human action recognition systems tend to face the problem of human-to-human or human-to-object occlusion in some cases. This can greatly affect the recognition accuracy. Most of the existing multi-view action recognition systems also do not focus on the spatio-temporal localization of actions. However, the problem of occlusion in meeting scenarios is a frequent phenomenon. Once it occurs, it can persist for a long time. Hence, existing methods and datasets do not work well in this scenario. This project aims to address the above limitations. We first process a multi-view meeting dataset, AMI (Augmented Multi-party Interaction) meeting corpus. To make it can be used for multi-view action recognition. In addition, we use SlowFast Network as a backbone network for action recognition and use Torchreid (A library for deep learning person re-identification in PyTorch) for instance association after learning the features of the input from different camera viewpoints. And finally, the system uses the method of late fusion to fuse the information from the left and right viewpoints into the center viewpoint that has occlusion problem. This method will improve the system's ability to deal with the occlusion problem. The method proposed in this project can improve by up to nearly 10 percent of the mAP (Mean Average Precision) on AMI meeting corpus compared to single-view recognition approaches.
author2 Tan Yap Peng
author_facet Tan Yap Peng
Yin, Haixiang
format Thesis-Master by Coursework
author Yin, Haixiang
author_sort Yin, Haixiang
title Multi-view human action recognition in meeting scenarios
title_short Multi-view human action recognition in meeting scenarios
title_full Multi-view human action recognition in meeting scenarios
title_fullStr Multi-view human action recognition in meeting scenarios
title_full_unstemmed Multi-view human action recognition in meeting scenarios
title_sort multi-view human action recognition in meeting scenarios
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/153357
_version_ 1772828549766447104