Human action recognition using artificial intelligence

Video action recognition is one of the specific tasks of video understanding, which aims to generate an action label, containing a verb and a noun, for a given video segment. As many other video understanding tasks, video action recognition is continuously under exploration of researchers and is at...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Haoyu
Other Authors: Yap Kim Hui
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/157639
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-157639
record_format dspace
spelling sg-ntu-dr.10356-1576392023-07-07T19:35:48Z Human action recognition using artificial intelligence Wang, Haoyu Yap Kim Hui School of Electrical and Electronic Engineering EKHYap@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Video action recognition is one of the specific tasks of video understanding, which aims to generate an action label, containing a verb and a noun, for a given video segment. As many other video understanding tasks, video action recognition is continuously under exploration of researchers and is at the same time, extensively applied to many real-life applications, like automatic driving, human-robot interaction, etc. Former researchers have established several different methods, including hand-crafted features, two-stream networks, 3D CNNs, etc. The fundamental difference among those methods is that they use different spatial-temporal modelling to capture both the spatial details and temporal relation in video segments, which are the keys for video tasks. However, due to the complexity of modelling such information, trade-off must always be made between a high accuracy and computational cost. Beside the prediction model, dataset is also crucial to video tasks as its scale and variety in action categories definitely help models pre-trained on it work better when deployed in real-life applications. In this project, a survey about various former action recognition method and action recognition dataset was conducted in order to comprehensively understand the problems mentioned above, and to evaluate and compare across the performance of the existing state-of-the-art methods. Then an efficient deep learning model was proposed to take advantage of 1) the cheap computation of 2D CNNs, 2) the ability of long-range temporal modelling of two-stream networks and 3D CNNs. The largest dataset in egocentric vision was selected as the benchmark dataset to compare the proposed model over its baseline. Extensive experiments were designed and conducted to analyse the results, which showed the proposed method has single digit accuracy improvement over the state-of-the-art. This report consists of the insights gained from survey about video action recognition models and dataset, the design of an efficient models, the experiment results with comparisons and discussions, and most important, the reflection about the design and development of the model and its performance. A short conclusion and a glimpse towards future works are made at the end. Bachelor of Engineering (Electrical and Electronic Engineering) 2022-05-21T12:43:05Z 2022-05-21T12:43:05Z 2022 Final Year Project (FYP) Wang, H. (2022). Human action recognition using artificial intelligence. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157639 https://hdl.handle.net/10356/157639 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Wang, Haoyu
Human action recognition using artificial intelligence
description Video action recognition is one of the specific tasks of video understanding, which aims to generate an action label, containing a verb and a noun, for a given video segment. As many other video understanding tasks, video action recognition is continuously under exploration of researchers and is at the same time, extensively applied to many real-life applications, like automatic driving, human-robot interaction, etc. Former researchers have established several different methods, including hand-crafted features, two-stream networks, 3D CNNs, etc. The fundamental difference among those methods is that they use different spatial-temporal modelling to capture both the spatial details and temporal relation in video segments, which are the keys for video tasks. However, due to the complexity of modelling such information, trade-off must always be made between a high accuracy and computational cost. Beside the prediction model, dataset is also crucial to video tasks as its scale and variety in action categories definitely help models pre-trained on it work better when deployed in real-life applications. In this project, a survey about various former action recognition method and action recognition dataset was conducted in order to comprehensively understand the problems mentioned above, and to evaluate and compare across the performance of the existing state-of-the-art methods. Then an efficient deep learning model was proposed to take advantage of 1) the cheap computation of 2D CNNs, 2) the ability of long-range temporal modelling of two-stream networks and 3D CNNs. The largest dataset in egocentric vision was selected as the benchmark dataset to compare the proposed model over its baseline. Extensive experiments were designed and conducted to analyse the results, which showed the proposed method has single digit accuracy improvement over the state-of-the-art. This report consists of the insights gained from survey about video action recognition models and dataset, the design of an efficient models, the experiment results with comparisons and discussions, and most important, the reflection about the design and development of the model and its performance. A short conclusion and a glimpse towards future works are made at the end.
author2 Yap Kim Hui
author_facet Yap Kim Hui
Wang, Haoyu
format Final Year Project
author Wang, Haoyu
author_sort Wang, Haoyu
title Human action recognition using artificial intelligence
title_short Human action recognition using artificial intelligence
title_full Human action recognition using artificial intelligence
title_fullStr Human action recognition using artificial intelligence
title_full_unstemmed Human action recognition using artificial intelligence
title_sort human action recognition using artificial intelligence
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/157639
_version_ 1772826857724444672