Deep historical long short-term memory network for action recognition

Human action recognition technology has received increasing interest recently. The technology is very useful in sports video analysis. Most of the action recognition methods in sports mainly focus on recognizing which sport is being performed. However, recognizing of the specific action in videos is...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلفون الرئيسيون: Cai, Jiaxin, Hu, Junlin, Tang, Xin, Hung,Tzu-Yi, Tan, Yap Peng
مؤلفون آخرون: School of Electrical and Electronic Engineering
التنسيق: مقال
اللغة:English
منشور في: 2022
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/160970
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
الوصف
الملخص:Human action recognition technology has received increasing interest recently. The technology is very useful in sports video analysis. Most of the action recognition methods in sports mainly focus on recognizing which sport is being performed. However, recognizing of the specific action in videos is important for the analysis of some sports video such as tennis matches. Hence, in this paper, we proposed a deep historical long short-term memory network for video-based tennis action recognition and general action recognition. First, the spatial representations are extracted from each frame using a pre-trained convolutional neural network (CNN). To describe the temporal information, a stacked multi-layer long short-term memory network (LSTM) was used. The historical information of the past frames is important for modeling the action. So we propose a historical information layer that is added to the top of the multi-layered LSTM network. A historical feature of each video is generated for classification by hybridizing the hidden state of LSTM at time t and the historical updated feature at time t-1 with an updating scheme and utilized for classification. Experiments on the benchmark datasets demonstrate that our method that using only simple raw RGB video can outperform the state-of-the-art baselines for both general action recognition and tennis action recognition.