Towards efficient video-based action recognition: context-aware memory attention network

Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation....

全面介紹

Saved in:

書目詳細資料
Main Authors:	Koh, Thean Chun, Yeo, Chai Kiat, Jing, Xuan, Sivadas, Sunil
其他作者:	School of Computer Science and Engineering
格式:	Article
語言:	English
出版:	2024
主題:	Computer and Information Science Action recognition Deep learning
在線閱讀:	https://hdl.handle.net/10356/173795
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

id	sg-ntu-dr.10356-173795
record_format	dspace
spelling	sg-ntu-dr.10356-1737952024-03-01T15:36:36Z Towards efficient video-based action recognition: context-aware memory attention network Koh, Thean Chun Yeo, Chai Kiat Jing, Xuan Sivadas, Sunil School of Computer Science and Engineering NCS Pte Ltd, Singapore Computer and Information Science Action recognition Deep learning Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation. In this research paper, we introduce a novel human action recognition model named Context-Aware Memory Attention Network (CAMA-Net), which eliminates the need for optical flow extraction and 3D convolution which are computationally intensive. By removing these components, CAMA-Net achieves superior efficiency compared to many existing approaches in terms of computation efficiency. A pivotal component of CAMA-Net is the Context-Aware Memory Attention Module, an attention module that computes the relevance score between key-value pairs obtained from the 2D ResNet backbone. This process establishes correspondences between video frames. To validate our method, we conduct experiments on four well-known action recognition datasets: ActivityNet, Diving48, HMDB51 and UCF101. The experimental results convincingly demonstrate the effectiveness of our proposed model, surpassing the performance of existing 2D-CNN based baseline models. Article Highlights: Recent human action recognition models are not yet ready for practical applications due to high computation needs. We propose a 2D CNN-based human action recognition method to reduce the computation load. The proposed method achieves competitive performance compared to most SOTA 2D CNN-based methods on public datasets. Published version This study is supported by RIE2020 Industry Alignment Fund—Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contributions from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). 2024-02-27T07:45:54Z 2024-02-27T07:45:54Z 2023 Journal Article Koh, T. C., Yeo, C. K., Jing, X. & Sivadas, S. (2023). Towards efficient video-based action recognition: context-aware memory attention network. SN Applied Sciences, 5(12). https://dx.doi.org/10.1007/s42452-023-05568-5 2523-3971 https://hdl.handle.net/10356/173795 10.1007/s42452-023-05568-5 2-s2.0-85176339491 12 5 en SN Applied Sciences © The Author(s) 2023. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Action recognition Deep learning
spellingShingle	Computer and Information Science Action recognition Deep learning Koh, Thean Chun Yeo, Chai Kiat Jing, Xuan Sivadas, Sunil Towards efficient video-based action recognition: context-aware memory attention network
description	Given the prevalence of surveillance cameras in our daily lives, human action recognition from videos holds significant practical applications. A persistent challenge in this field is to develop more efficient models capable of real-time recognition with high accuracy for widespread implementation. In this research paper, we introduce a novel human action recognition model named Context-Aware Memory Attention Network (CAMA-Net), which eliminates the need for optical flow extraction and 3D convolution which are computationally intensive. By removing these components, CAMA-Net achieves superior efficiency compared to many existing approaches in terms of computation efficiency. A pivotal component of CAMA-Net is the Context-Aware Memory Attention Module, an attention module that computes the relevance score between key-value pairs obtained from the 2D ResNet backbone. This process establishes correspondences between video frames. To validate our method, we conduct experiments on four well-known action recognition datasets: ActivityNet, Diving48, HMDB51 and UCF101. The experimental results convincingly demonstrate the effectiveness of our proposed model, surpassing the performance of existing 2D-CNN based baseline models. Article Highlights: Recent human action recognition models are not yet ready for practical applications due to high computation needs. We propose a 2D CNN-based human action recognition method to reduce the computation load. The proposed method achieves competitive performance compared to most SOTA 2D CNN-based methods on public datasets.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Koh, Thean Chun Yeo, Chai Kiat Jing, Xuan Sivadas, Sunil
format	Article
author	Koh, Thean Chun Yeo, Chai Kiat Jing, Xuan Sivadas, Sunil
author_sort	Koh, Thean Chun
title	Towards efficient video-based action recognition: context-aware memory attention network
title_short	Towards efficient video-based action recognition: context-aware memory attention network
title_full	Towards efficient video-based action recognition: context-aware memory attention network
title_fullStr	Towards efficient video-based action recognition: context-aware memory attention network
title_full_unstemmed	Towards efficient video-based action recognition: context-aware memory attention network
title_sort	towards efficient video-based action recognition: context-aware memory attention network
publishDate	2024
url	https://hdl.handle.net/10356/173795
_version_	1794549499246411776

Towards efficient video-based action recognition: context-aware memory attention network

相似書籍