WiFi and vision multimodal self-supervised learning for human activity recognition

Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial...

Full description

Saved in:
Bibliographic Details
Main Author: Tang, Shijie
Other Authors: Xie Lihua
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171660
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial WiFi-enabled IoT devices and cameras. The MPF network architecture is designed to leverage massive unlabeled data for pretraining based on self-supervised learning and little labeled data for fine-tune. In the pretrain stage, we fuse WIFI channel state information and video data and use a multi-head self-attention network for masked data modeling. The network then performs self-supervised learning by predicting the masked data representation. In the subsequent fine-tune stage, we add the GRU layer on the basis of the pre-trained self-attention network to extract temporal features between frames and output human activity recognition results through the linear layer. Experiments show that the proposed MPF network outperforms the baseline network on two human activity recognition datasets and demonstrates strong robustness and generalization ability. Furthermore, the MPF network correctly classifies common human actions and who performs them.