Human action recognition using artificial intelligence

Video action recognition is one of the specific tasks of video understanding, which aims to generate an action label, containing a verb and a noun, for a given video segment. As many other video understanding tasks, video action recognition is continuously under exploration of researchers and is at...

Full description

Saved in:

Bibliographic Details
Main Author:	Wang, Haoyu
Other Authors:	Yap Kim Hui
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Online Access:	https://hdl.handle.net/10356/157639
Tags:	Add Tag No Tags, Be the first to tag this record!

id	sg-ntu-dr.10356-157639
record_format	dspace
spelling	sg-ntu-dr.10356-1576392023-07-07T19:35:48Z Human action recognition using artificial intelligence Wang, Haoyu Yap Kim Hui School of Electrical and Electronic Engineering EKHYap@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Video action recognition is one of the specific tasks of video understanding, which aims to generate an action label, containing a verb and a noun, for a given video segment. As many other video understanding tasks, video action recognition is continuously under exploration of researchers and is at the same time, extensively applied to many real-life applications, like automatic driving, human-robot interaction, etc. Former researchers have established several different methods, including hand-crafted features, two-stream networks, 3D CNNs, etc. The fundamental difference among those methods is that they use different spatial-temporal modelling to capture both the spatial details and temporal relation in video segments, which are the keys for video tasks. However, due to the complexity of modelling such information, trade-off must always be made between a high accuracy and computational cost. Beside the prediction model, dataset is also crucial to video tasks as its scale and variety in action categories definitely help models pre-trained on it work better when deployed in real-life applications. In this project, a survey about various former action recognition method and action recognition dataset was conducted in order to comprehensively understand the problems mentioned above, and to evaluate and compare across the performance of the existing state-of-the-art methods. Then an efficient deep learning model was proposed to take advantage of 1) the cheap computation of 2D CNNs, 2) the ability of long-range temporal modelling of two-stream networks and 3D CNNs. The largest dataset in egocentric vision was selected as the benchmark dataset to compare the proposed model over its baseline. Extensive experiments were designed and conducted to analyse the results, which showed the proposed method has single digit accuracy improvement over the state-of-the-art. This report consists of the insights gained from survey about video action recognition models and dataset, the design of an efficient models, the experiment results with comparisons and discussions, and most important, the reflection about the design and development of the model and its performance. A short conclusion and a glimpse towards future works are made at the end. Bachelor of Engineering (Electrical and Electronic Engineering) 2022-05-21T12:43:05Z 2022-05-21T12:43:05Z 2022 Final Year Project (FYP) Wang, H. (2022). Human action recognition using artificial intelligence. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157639 https://hdl.handle.net/10356/157639 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Wang, Haoyu Human action recognition using artificial intelligence
description	Video action recognition is one of the specific tasks of video understanding, which aims to generate an action label, containing a verb and a noun, for a given video segment. As many other video understanding tasks, video action recognition is continuously under exploration of researchers and is at the same time, extensively applied to many real-life applications, like automatic driving, human-robot interaction, etc. Former researchers have established several different methods, including hand-crafted features, two-stream networks, 3D CNNs, etc. The fundamental difference among those methods is that they use different spatial-temporal modelling to capture both the spatial details and temporal relation in video segments, which are the keys for video tasks. However, due to the complexity of modelling such information, trade-off must always be made between a high accuracy and computational cost. Beside the prediction model, dataset is also crucial to video tasks as its scale and variety in action categories definitely help models pre-trained on it work better when deployed in real-life applications. In this project, a survey about various former action recognition method and action recognition dataset was conducted in order to comprehensively understand the problems mentioned above, and to evaluate and compare across the performance of the existing state-of-the-art methods. Then an efficient deep learning model was proposed to take advantage of 1) the cheap computation of 2D CNNs, 2) the ability of long-range temporal modelling of two-stream networks and 3D CNNs. The largest dataset in egocentric vision was selected as the benchmark dataset to compare the proposed model over its baseline. Extensive experiments were designed and conducted to analyse the results, which showed the proposed method has single digit accuracy improvement over the state-of-the-art. This report consists of the insights gained from survey about video action recognition models and dataset, the design of an efficient models, the experiment results with comparisons and discussions, and most important, the reflection about the design and development of the model and its performance. A short conclusion and a glimpse towards future works are made at the end.
author2	Yap Kim Hui
author_facet	Yap Kim Hui Wang, Haoyu
format	Final Year Project
author	Wang, Haoyu
author_sort	Wang, Haoyu
title	Human action recognition using artificial intelligence
title_short	Human action recognition using artificial intelligence
title_full	Human action recognition using artificial intelligence
title_fullStr	Human action recognition using artificial intelligence
title_full_unstemmed	Human action recognition using artificial intelligence
title_sort	human action recognition using artificial intelligence
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/157639
_version_	1772826857724444672

Human action recognition using artificial intelligence

Similar Items