WiFi and vision multimodal self-supervised learning for human activity recognition
Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/171660 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial WiFi-enabled IoT devices and cameras.
The MPF network architecture is designed to leverage massive unlabeled data for pretraining based on self-supervised learning and little labeled data for fine-tune. In the pretrain stage, we fuse WIFI channel state information and video data and use a multi-head self-attention network for masked data modeling. The network then performs self-supervised learning by predicting the masked data representation. In the subsequent fine-tune stage, we add the GRU layer on the basis of the pre-trained self-attention network to extract temporal features between frames and output human activity recognition results through the linear layer.
Experiments show that the proposed MPF network outperforms the baseline network on two human activity recognition datasets and demonstrates strong robustness and generalization ability. Furthermore, the MPF network correctly classifies common human actions and who performs them. |
---|