WiFi and vision multimodal self-supervised learning for human activity recognition
Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/171660 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-171660 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1716602023-11-10T15:44:51Z WiFi and vision multimodal self-supervised learning for human activity recognition Tang, Shijie Xie Lihua School of Electrical and Electronic Engineering ELHXIE@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial WiFi-enabled IoT devices and cameras. The MPF network architecture is designed to leverage massive unlabeled data for pretraining based on self-supervised learning and little labeled data for fine-tune. In the pretrain stage, we fuse WIFI channel state information and video data and use a multi-head self-attention network for masked data modeling. The network then performs self-supervised learning by predicting the masked data representation. In the subsequent fine-tune stage, we add the GRU layer on the basis of the pre-trained self-attention network to extract temporal features between frames and output human activity recognition results through the linear layer. Experiments show that the proposed MPF network outperforms the baseline network on two human activity recognition datasets and demonstrates strong robustness and generalization ability. Furthermore, the MPF network correctly classifies common human actions and who performs them. Master of Science (Computer Control and Automation) 2023-11-06T00:17:39Z 2023-11-06T00:17:39Z 2023 Thesis-Master by Coursework Tang, S. (2023). WiFi and vision multimodal self-supervised learning for human activity recognition. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/171660 https://hdl.handle.net/10356/171660 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering::Computer hardware, software and systems |
spellingShingle |
Engineering::Electrical and electronic engineering::Computer hardware, software and systems Tang, Shijie WiFi and vision multimodal self-supervised learning for human activity recognition |
description |
Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial WiFi-enabled IoT devices and cameras.
The MPF network architecture is designed to leverage massive unlabeled data for pretraining based on self-supervised learning and little labeled data for fine-tune. In the pretrain stage, we fuse WIFI channel state information and video data and use a multi-head self-attention network for masked data modeling. The network then performs self-supervised learning by predicting the masked data representation. In the subsequent fine-tune stage, we add the GRU layer on the basis of the pre-trained self-attention network to extract temporal features between frames and output human activity recognition results through the linear layer.
Experiments show that the proposed MPF network outperforms the baseline network on two human activity recognition datasets and demonstrates strong robustness and generalization ability. Furthermore, the MPF network correctly classifies common human actions and who performs them. |
author2 |
Xie Lihua |
author_facet |
Xie Lihua Tang, Shijie |
format |
Thesis-Master by Coursework |
author |
Tang, Shijie |
author_sort |
Tang, Shijie |
title |
WiFi and vision multimodal self-supervised learning for human activity recognition |
title_short |
WiFi and vision multimodal self-supervised learning for human activity recognition |
title_full |
WiFi and vision multimodal self-supervised learning for human activity recognition |
title_fullStr |
WiFi and vision multimodal self-supervised learning for human activity recognition |
title_full_unstemmed |
WiFi and vision multimodal self-supervised learning for human activity recognition |
title_sort |
wifi and vision multimodal self-supervised learning for human activity recognition |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/171660 |
_version_ |
1783955635992264704 |