Towards efficient video-based action recognition: context-aware memory attention network
Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation....
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/173795 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-173795 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1737952024-03-01T15:36:36Z Towards efficient video-based action recognition: context-aware memory attention network Koh, Thean Chun Yeo, Chai Kiat Jing, Xuan Sivadas, Sunil School of Computer Science and Engineering NCS Pte Ltd, Singapore Computer and Information Science Action recognition Deep learning Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation. In this research paper, we introduce a novel human action recognition model named Context-Aware Memory Attention Network (CAMA-Net), which eliminates the need for optical flow extraction and 3D convolution which are computationally intensive. By removing these components, CAMA-Net achieves superior efficiency compared to many existing approaches in terms of computation efficiency. A pivotal component of CAMA-Net is the Context-Aware Memory Attention Module, an attention module that computes the relevance score between key-value pairs obtained from the 2D ResNet backbone. This process establishes correspondences between video frames. To validate our method, we conduct experiments on four well-known action recognition datasets: ActivityNet, Diving48, HMDB51 and UCF101. The experimental results convincingly demonstrate the effectiveness of our proposed model, surpassing the performance of existing 2D-CNN based baseline models. Article Highlights: Recent human action recognition models are not yet ready for practical applications due to high computation needs. We propose a 2D CNN-based human action recognition method to reduce the computation load. The proposed method achieves competitive performance compared to most SOTA 2D CNN-based methods on public datasets. Published version This study is supported by RIE2020 Industry Alignment Fund—Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contributions from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). 2024-02-27T07:45:54Z 2024-02-27T07:45:54Z 2023 Journal Article Koh, T. C., Yeo, C. K., Jing, X. & Sivadas, S. (2023). Towards efficient video-based action recognition: context-aware memory attention network. SN Applied Sciences, 5(12). https://dx.doi.org/10.1007/s42452-023-05568-5 2523-3971 https://hdl.handle.net/10356/173795 10.1007/s42452-023-05568-5 2-s2.0-85176339491 12 5 en SN Applied Sciences © The Author(s) 2023. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Action recognition Deep learning |
spellingShingle |
Computer and Information Science Action recognition Deep learning Koh, Thean Chun Yeo, Chai Kiat Jing, Xuan Sivadas, Sunil Towards efficient video-based action recognition: context-aware memory attention network |
description |
Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation. In this research paper, we introduce a novel human action recognition model named Context-Aware Memory Attention Network (CAMA-Net), which eliminates the need for optical flow extraction and 3D convolution which are computationally intensive. By removing these components, CAMA-Net achieves superior efficiency compared to many existing approaches in terms of computation efficiency. A pivotal component of CAMA-Net is the Context-Aware Memory Attention Module, an attention module that computes the relevance score between key-value pairs obtained from the 2D ResNet backbone. This process establishes correspondences between video frames. To validate our method, we conduct experiments on four well-known action recognition datasets: ActivityNet, Diving48, HMDB51 and UCF101. The experimental results convincingly demonstrate the effectiveness of our proposed model, surpassing the performance of existing 2D-CNN based baseline models. Article Highlights: Recent human action recognition models are not yet ready for practical applications due to high computation needs. We propose a 2D CNN-based human action recognition method to reduce the computation load. The proposed method achieves competitive performance compared to most SOTA 2D CNN-based methods on public datasets. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Koh, Thean Chun Yeo, Chai Kiat Jing, Xuan Sivadas, Sunil |
format |
Article |
author |
Koh, Thean Chun Yeo, Chai Kiat Jing, Xuan Sivadas, Sunil |
author_sort |
Koh, Thean Chun |
title |
Towards efficient video-based action recognition: context-aware memory attention network |
title_short |
Towards efficient video-based action recognition: context-aware memory attention network |
title_full |
Towards efficient video-based action recognition: context-aware memory attention network |
title_fullStr |
Towards efficient video-based action recognition: context-aware memory attention network |
title_full_unstemmed |
Towards efficient video-based action recognition: context-aware memory attention network |
title_sort |
towards efficient video-based action recognition: context-aware memory attention network |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/173795 |
_version_ |
1794549499246411776 |