Human action recognition using artificial intelligence
Video action recognition is one of the specific tasks of video understanding, which aims to generate an action label, containing a verb and a noun, for a given video segment. As many other video understanding tasks, video action recognition is continuously under exploration of researchers and is at...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/157639 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-157639 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1576392023-07-07T19:35:48Z Human action recognition using artificial intelligence Wang, Haoyu Yap Kim Hui School of Electrical and Electronic Engineering EKHYap@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Video action recognition is one of the specific tasks of video understanding, which aims to generate an action label, containing a verb and a noun, for a given video segment. As many other video understanding tasks, video action recognition is continuously under exploration of researchers and is at the same time, extensively applied to many real-life applications, like automatic driving, human-robot interaction, etc. Former researchers have established several different methods, including hand-crafted features, two-stream networks, 3D CNNs, etc. The fundamental difference among those methods is that they use different spatial-temporal modelling to capture both the spatial details and temporal relation in video segments, which are the keys for video tasks. However, due to the complexity of modelling such information, trade-off must always be made between a high accuracy and computational cost. Beside the prediction model, dataset is also crucial to video tasks as its scale and variety in action categories definitely help models pre-trained on it work better when deployed in real-life applications. In this project, a survey about various former action recognition method and action recognition dataset was conducted in order to comprehensively understand the problems mentioned above, and to evaluate and compare across the performance of the existing state-of-the-art methods. Then an efficient deep learning model was proposed to take advantage of 1) the cheap computation of 2D CNNs, 2) the ability of long-range temporal modelling of two-stream networks and 3D CNNs. The largest dataset in egocentric vision was selected as the benchmark dataset to compare the proposed model over its baseline. Extensive experiments were designed and conducted to analyse the results, which showed the proposed method has single digit accuracy improvement over the state-of-the-art. This report consists of the insights gained from survey about video action recognition models and dataset, the design of an efficient models, the experiment results with comparisons and discussions, and most important, the reflection about the design and development of the model and its performance. A short conclusion and a glimpse towards future works are made at the end. Bachelor of Engineering (Electrical and Electronic Engineering) 2022-05-21T12:43:05Z 2022-05-21T12:43:05Z 2022 Final Year Project (FYP) Wang, H. (2022). Human action recognition using artificial intelligence. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157639 https://hdl.handle.net/10356/157639 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Wang, Haoyu Human action recognition using artificial intelligence |
description |
Video action recognition is one of the specific tasks of video understanding, which aims to generate an action label, containing a verb and a noun, for a given video segment. As many other video understanding tasks, video action recognition is continuously under exploration of researchers and is at the same time, extensively applied to many real-life applications, like automatic driving, human-robot interaction, etc. Former researchers have established several different methods, including hand-crafted features, two-stream networks, 3D CNNs, etc. The fundamental difference among those methods is that they use different spatial-temporal modelling to capture both the spatial details and temporal relation in video segments, which are the keys for video tasks. However, due to the complexity of modelling such information, trade-off must always be made between a high accuracy and computational cost. Beside the prediction model, dataset is also crucial to video tasks as its scale and variety in action categories definitely help models pre-trained on it work better when deployed in real-life applications.
In this project, a survey about various former action recognition method and action recognition dataset was conducted in order to comprehensively understand the problems mentioned above, and to evaluate and compare across the performance of the existing state-of-the-art methods. Then an efficient deep learning model was proposed to take advantage of 1) the cheap computation of 2D CNNs, 2) the ability of long-range temporal modelling of two-stream networks and 3D CNNs. The largest dataset in egocentric vision was selected as the benchmark dataset to compare the proposed model over its baseline. Extensive experiments were designed and conducted to analyse the results, which showed the proposed method has single digit accuracy improvement over the state-of-the-art.
This report consists of the insights gained from survey about video action recognition models and dataset, the design of an efficient models, the experiment results with comparisons and discussions, and most important, the reflection about the design and development of the model and its performance. A short conclusion and a glimpse towards future works are made at the end. |
author2 |
Yap Kim Hui |
author_facet |
Yap Kim Hui Wang, Haoyu |
format |
Final Year Project |
author |
Wang, Haoyu |
author_sort |
Wang, Haoyu |
title |
Human action recognition using artificial intelligence |
title_short |
Human action recognition using artificial intelligence |
title_full |
Human action recognition using artificial intelligence |
title_fullStr |
Human action recognition using artificial intelligence |
title_full_unstemmed |
Human action recognition using artificial intelligence |
title_sort |
human action recognition using artificial intelligence |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/157639 |
_version_ |
1772826857724444672 |