SkeFi: cross-modal knowledge transfer for wireless skeleton-based action recognition

Skeleton-based action recognition can effectively solve the problem of reduced classification accuracy caused by background clutter. However, prevalent skeleton datasets predominantly rely on cameras to capture RGB frames and annotate skeletal keypoints, leading to susceptibility to ambient lighting...

Full description

Saved in:
Bibliographic Details
Main Author: Huang, Shunyu
Other Authors: Xie Lihua
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
GCN
Online Access:https://hdl.handle.net/10356/174832
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Skeleton-based action recognition can effectively solve the problem of reduced classification accuracy caused by background clutter. However, prevalent skeleton datasets predominantly rely on cameras to capture RGB frames and annotate skeletal keypoints, leading to susceptibility to ambient lighting fluctuations and potential privacy violations. To mitigate these challenges, leveraging non-invasive sensors such as LiDAR and mmWave for wireless human sensing emerges as a viable alternative. Nevertheless, the diminutive data size associated with these non-invasive sensing methodologies renders the direct application of RGB-based skeleton action classification models less than optimal. Moreover, extracting keypoints from non-invasive data lacks the precision found in RGB modalities, culminating in skeleton data marred by noise and information loss. To address these issues, our work involves cross-model knowledge transfer acquired from the data-rich RGB modality to our classification task, naming this framework SkeFi. For specific instances of information loss, we integrate the enhanced Temporal Correlation Adaptive Graph Convolution (TC-AGC). Additionally, our research underscores the capability of augmenting the intrinsic potency of multiscale temporal modeling through the integration of ESPNet modules. By combining TC-AGC with this improved temporal modeling and implementing transfer learning, our framework realizes superior performance across three non-invasive modalities (RGB, mmWave, and LiDAR).