WiFi and vision multimodal self-supervised learning for human activity recognition

Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial...

Full description

Saved in:

Bibliographic Details
Main Author:	Tang, Shijie
Other Authors:	Xie Lihua
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Online Access:	https://hdl.handle.net/10356/171660
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-171660
record_format	dspace
spelling	sg-ntu-dr.10356-1716602023-11-10T15:44:51Z WiFi and vision multimodal self-supervised learning for human activity recognition Tang, Shijie Xie Lihua School of Electrical and Electronic Engineering ELHXIE@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial WiFi-enabled IoT devices and cameras. The MPF network architecture is designed to leverage massive unlabeled data for pretraining based on self-supervised learning and little labeled data for fine-tune. In the pretrain stage, we fuse WIFI channel state information and video data and use a multi-head self-attention network for masked data modeling. The network then performs self-supervised learning by predicting the masked data representation. In the subsequent fine-tune stage, we add the GRU layer on the basis of the pre-trained self-attention network to extract temporal features between frames and output human activity recognition results through the linear layer. Experiments show that the proposed MPF network outperforms the baseline network on two human activity recognition datasets and demonstrates strong robustness and generalization ability. Furthermore, the MPF network correctly classifies common human actions and who performs them. Master of Science (Computer Control and Automation) 2023-11-06T00:17:39Z 2023-11-06T00:17:39Z 2023 Thesis-Master by Coursework Tang, S. (2023). WiFi and vision multimodal self-supervised learning for human activity recognition. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/171660 https://hdl.handle.net/10356/171660 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	Engineering::Electrical and electronic engineering::Computer hardware, software and systems Tang, Shijie WiFi and vision multimodal self-supervised learning for human activity recognition
description	Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial WiFi-enabled IoT devices and cameras. The MPF network architecture is designed to leverage massive unlabeled data for pretraining based on self-supervised learning and little labeled data for fine-tune. In the pretrain stage, we fuse WIFI channel state information and video data and use a multi-head self-attention network for masked data modeling. The network then performs self-supervised learning by predicting the masked data representation. In the subsequent fine-tune stage, we add the GRU layer on the basis of the pre-trained self-attention network to extract temporal features between frames and output human activity recognition results through the linear layer. Experiments show that the proposed MPF network outperforms the baseline network on two human activity recognition datasets and demonstrates strong robustness and generalization ability. Furthermore, the MPF network correctly classifies common human actions and who performs them.
author2	Xie Lihua
author_facet	Xie Lihua Tang, Shijie
format	Thesis-Master by Coursework
author	Tang, Shijie
author_sort	Tang, Shijie
title	WiFi and vision multimodal self-supervised learning for human activity recognition
title_short	WiFi and vision multimodal self-supervised learning for human activity recognition
title_full	WiFi and vision multimodal self-supervised learning for human activity recognition
title_fullStr	WiFi and vision multimodal self-supervised learning for human activity recognition
title_full_unstemmed	WiFi and vision multimodal self-supervised learning for human activity recognition
title_sort	wifi and vision multimodal self-supervised learning for human activity recognition
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/171660
_version_	1783955635992264704

WiFi and vision multimodal self-supervised learning for human activity recognition

Similar Items