Skeleton-based human activity understanding

Human activity understanding is an important research problem due to its relevance to a wide range of applications. Recently, 3D skeleton-based activity analysis becomes popular due to its succinctness, robustness, and view-invariant representation. In this thesis, we focus on human activity underst...

全面介紹

Saved in:

書目詳細資料
主要作者:	Liu, Jun
其他作者:	Alex Kot
格式:	Theses and Dissertations
語言:	English
出版:	2019
主題:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
在線閱讀:	https://hdl.handle.net/10356/104427 http://hdl.handle.net/10220/49510
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

id	sg-ntu-dr.10356-104427
record_format	dspace
spelling	sg-ntu-dr.10356-1044272023-07-04T16:43:03Z Skeleton-based human activity understanding Liu, Jun Alex Kot School of Electrical and Electronic Engineering Research Techno Plaza Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Human activity understanding is an important research problem due to its relevance to a wide range of applications. Recently, 3D skeleton-based activity analysis becomes popular due to its succinctness, robustness, and view-invariant representation. In this thesis, we focus on human activity understanding in 3D skeleton sequences. Recent works attempted to utilize recurrent neural networks (RNNs) and long short-term memory (LSTM) networks to model the temporal dependencies between the 3D positional configurations of human body joints for better analysis of human activities in the 3D skeletal data. As the first work of this thesis, we apply recurrent analysis to spatial domain as well as temporal domain to better analyze the hidden sources of action-related information within the human skeleton sequences in both of these domains simultaneously. Based on the pictorial structure of Kinect's skeletal data, an effective tree-structure based traversal framework is also proposed. In order to deal with the noise in the skeletal data, a new gating mechanism within LSTM module is introduced, with which the network can learn the reliability of the sequential data and accordingly adjust the effect of the input data on the updating procedure of the long-term context representation stored in the unit's memory cell. The comprehensive experimental results on seven challenging benchmark datasets for human action recognition demonstrate the effectiveness of the proposed method. In skeleton-based action recognition, not all skeletal joints are informative for activity analysis, and the irrelevant joints often bring noise which can degrade the performance. Therefore, we need to pay more attention to the informative ones. However, the original LSTM network does not have explicit attention ability. In our second piece of work, we propose a new class of LSTM network, global context-aware attention LSTM, for skeleton-based action recognition, which is capable of selectively focusing on the informative joints in each frame by using a global context memory cell. The proposed method achieves state-of-the-art performance on five challenging datasets for skeleton-based action recognition. The aforementioned two works focus on action recognition in well-segmented skeleton sequences, in which each sequence includes one action sample and we need to recognize its class. In the third work, we focus on online action prediction in untrimmed streaming skeleton data, in which each sequence contains multiple action samples and we need to recognize the class label of the current ongoing activity when only a part of it is observed. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the temporal axis for online action prediction. As there are significant temporal scale variations in the observed part of the ongoing action at different time steps, a novel window scale selection method is proposed, which makes our network focus on the performed part of the ongoing action and suppress the possible incoming interference from the previous actions. The proposed approach is evaluated on four challenging datasets. The extensive experiments demonstrate the effectiveness of the proposed method for skeleton-based online action prediction. Doctor of Philosophy 2019-08-01T05:03:26Z 2019-12-06T21:32:32Z 2019-08-01T05:03:26Z 2019-12-06T21:32:32Z 2019 Thesis Liu, J. (2019). Skeleton-based human activity understanding. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/104427 http://hdl.handle.net/10220/49510 10.32657/10220/49510 en 166 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Liu, Jun Skeleton-based human activity understanding
description	Human activity understanding is an important research problem due to its relevance to a wide range of applications. Recently, 3D skeleton-based activity analysis becomes popular due to its succinctness, robustness, and view-invariant representation. In this thesis, we focus on human activity understanding in 3D skeleton sequences. Recent works attempted to utilize recurrent neural networks (RNNs) and long short-term memory (LSTM) networks to model the temporal dependencies between the 3D positional configurations of human body joints for better analysis of human activities in the 3D skeletal data. As the first work of this thesis, we apply recurrent analysis to spatial domain as well as temporal domain to better analyze the hidden sources of action-related information within the human skeleton sequences in both of these domains simultaneously. Based on the pictorial structure of Kinect's skeletal data, an effective tree-structure based traversal framework is also proposed. In order to deal with the noise in the skeletal data, a new gating mechanism within LSTM module is introduced, with which the network can learn the reliability of the sequential data and accordingly adjust the effect of the input data on the updating procedure of the long-term context representation stored in the unit's memory cell. The comprehensive experimental results on seven challenging benchmark datasets for human action recognition demonstrate the effectiveness of the proposed method. In skeleton-based action recognition, not all skeletal joints are informative for activity analysis, and the irrelevant joints often bring noise which can degrade the performance. Therefore, we need to pay more attention to the informative ones. However, the original LSTM network does not have explicit attention ability. In our second piece of work, we propose a new class of LSTM network, global context-aware attention LSTM, for skeleton-based action recognition, which is capable of selectively focusing on the informative joints in each frame by using a global context memory cell. The proposed method achieves state-of-the-art performance on five challenging datasets for skeleton-based action recognition. The aforementioned two works focus on action recognition in well-segmented skeleton sequences, in which each sequence includes one action sample and we need to recognize its class. In the third work, we focus on online action prediction in untrimmed streaming skeleton data, in which each sequence contains multiple action samples and we need to recognize the class label of the current ongoing activity when only a part of it is observed. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the temporal axis for online action prediction. As there are significant temporal scale variations in the observed part of the ongoing action at different time steps, a novel window scale selection method is proposed, which makes our network focus on the performed part of the ongoing action and suppress the possible incoming interference from the previous actions. The proposed approach is evaluated on four challenging datasets. The extensive experiments demonstrate the effectiveness of the proposed method for skeleton-based online action prediction.
author2	Alex Kot
author_facet	Alex Kot Liu, Jun
format	Theses and Dissertations
author	Liu, Jun
author_sort	Liu, Jun
title	Skeleton-based human activity understanding
title_short	Skeleton-based human activity understanding
title_full	Skeleton-based human activity understanding
title_fullStr	Skeleton-based human activity understanding
title_full_unstemmed	Skeleton-based human activity understanding
title_sort	skeleton-based human activity understanding
publishDate	2019
url	https://hdl.handle.net/10356/104427 http://hdl.handle.net/10220/49510
_version_	1772826721578385408

Skeleton-based human activity understanding

相似書籍