Towards efficient video-based action recognition: context-aware memory attention network

Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation....

Full description

Saved in:
Bibliographic Details
Main Authors: Koh, Thean Chun, Yeo, Chai Kiat, Jing, Xuan, Sivadas, Sunil
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173795
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-173795
record_format dspace
spelling sg-ntu-dr.10356-1737952024-03-01T15:36:36Z Towards efficient video-based action recognition: context-aware memory attention network Koh, Thean Chun Yeo, Chai Kiat Jing, Xuan Sivadas, Sunil School of Computer Science and Engineering NCS Pte Ltd, Singapore Computer and Information Science Action recognition Deep learning Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation. In this research paper, we introduce a novel human action recognition model named Context-Aware Memory Attention Network (CAMA-Net), which eliminates the need for optical flow extraction and 3D convolution which are computationally intensive. By removing these components, CAMA-Net achieves superior efficiency compared to many existing approaches in terms of computation efficiency. A pivotal component of CAMA-Net is the Context-Aware Memory Attention Module, an attention module that computes the relevance score between key-value pairs obtained from the 2D ResNet backbone. This process establishes correspondences between video frames. To validate our method, we conduct experiments on four well-known action recognition datasets: ActivityNet, Diving48, HMDB51 and UCF101. The experimental results convincingly demonstrate the effectiveness of our proposed model, surpassing the performance of existing 2D-CNN based baseline models. Article Highlights: Recent human action recognition models are not yet ready for practical applications due to high computation needs. We propose a 2D CNN-based human action recognition method to reduce the computation load. The proposed method achieves competitive performance compared to most SOTA 2D CNN-based methods on public datasets. Published version This study is supported by RIE2020 Industry Alignment Fund—Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contributions from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). 2024-02-27T07:45:54Z 2024-02-27T07:45:54Z 2023 Journal Article Koh, T. C., Yeo, C. K., Jing, X. & Sivadas, S. (2023). Towards efficient video-based action recognition: context-aware memory attention network. SN Applied Sciences, 5(12). https://dx.doi.org/10.1007/s42452-023-05568-5 2523-3971 https://hdl.handle.net/10356/173795 10.1007/s42452-023-05568-5 2-s2.0-85176339491 12 5 en SN Applied Sciences © The Author(s) 2023. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Action recognition
Deep learning
spellingShingle Computer and Information Science
Action recognition
Deep learning
Koh, Thean Chun
Yeo, Chai Kiat
Jing, Xuan
Sivadas, Sunil
Towards efficient video-based action recognition: context-aware memory attention network
description Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation. In this research paper, we introduce a novel human action recognition model named Context-Aware Memory Attention Network (CAMA-Net), which eliminates the need for optical flow extraction and 3D convolution which are computationally intensive. By removing these components, CAMA-Net achieves superior efficiency compared to many existing approaches in terms of computation efficiency. A pivotal component of CAMA-Net is the Context-Aware Memory Attention Module, an attention module that computes the relevance score between key-value pairs obtained from the 2D ResNet backbone. This process establishes correspondences between video frames. To validate our method, we conduct experiments on four well-known action recognition datasets: ActivityNet, Diving48, HMDB51 and UCF101. The experimental results convincingly demonstrate the effectiveness of our proposed model, surpassing the performance of existing 2D-CNN based baseline models. Article Highlights: Recent human action recognition models are not yet ready for practical applications due to high computation needs. We propose a 2D CNN-based human action recognition method to reduce the computation load. The proposed method achieves competitive performance compared to most SOTA 2D CNN-based methods on public datasets.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Koh, Thean Chun
Yeo, Chai Kiat
Jing, Xuan
Sivadas, Sunil
format Article
author Koh, Thean Chun
Yeo, Chai Kiat
Jing, Xuan
Sivadas, Sunil
author_sort Koh, Thean Chun
title Towards efficient video-based action recognition: context-aware memory attention network
title_short Towards efficient video-based action recognition: context-aware memory attention network
title_full Towards efficient video-based action recognition: context-aware memory attention network
title_fullStr Towards efficient video-based action recognition: context-aware memory attention network
title_full_unstemmed Towards efficient video-based action recognition: context-aware memory attention network
title_sort towards efficient video-based action recognition: context-aware memory attention network
publishDate 2024
url https://hdl.handle.net/10356/173795
_version_ 1794549499246411776