WiFi and vision multimodal self-supervised learning for human activity recognition

Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial...

Full description

Saved in:
Bibliographic Details
Main Author: Tang, Shijie
Other Authors: Xie Lihua
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171660
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-171660
record_format dspace
spelling sg-ntu-dr.10356-1716602023-11-10T15:44:51Z WiFi and vision multimodal self-supervised learning for human activity recognition Tang, Shijie Xie Lihua School of Electrical and Electronic Engineering ELHXIE@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial WiFi-enabled IoT devices and cameras. The MPF network architecture is designed to leverage massive unlabeled data for pretraining based on self-supervised learning and little labeled data for fine-tune. In the pretrain stage, we fuse WIFI channel state information and video data and use a multi-head self-attention network for masked data modeling. The network then performs self-supervised learning by predicting the masked data representation. In the subsequent fine-tune stage, we add the GRU layer on the basis of the pre-trained self-attention network to extract temporal features between frames and output human activity recognition results through the linear layer. Experiments show that the proposed MPF network outperforms the baseline network on two human activity recognition datasets and demonstrates strong robustness and generalization ability. Furthermore, the MPF network correctly classifies common human actions and who performs them. Master of Science (Computer Control and Automation) 2023-11-06T00:17:39Z 2023-11-06T00:17:39Z 2023 Thesis-Master by Coursework Tang, S. (2023). WiFi and vision multimodal self-supervised learning for human activity recognition. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/171660 https://hdl.handle.net/10356/171660 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Tang, Shijie
WiFi and vision multimodal self-supervised learning for human activity recognition
description Human activity recognition has become increasingly important in various fields such as security, medical care, entertainment, and home furnishing. In this context, we propose a novel Multimodal Pretrain fine-tune (MPF) network architecture for device-free human activity recognition using commercial WiFi-enabled IoT devices and cameras. The MPF network architecture is designed to leverage massive unlabeled data for pretraining based on self-supervised learning and little labeled data for fine-tune. In the pretrain stage, we fuse WIFI channel state information and video data and use a multi-head self-attention network for masked data modeling. The network then performs self-supervised learning by predicting the masked data representation. In the subsequent fine-tune stage, we add the GRU layer on the basis of the pre-trained self-attention network to extract temporal features between frames and output human activity recognition results through the linear layer. Experiments show that the proposed MPF network outperforms the baseline network on two human activity recognition datasets and demonstrates strong robustness and generalization ability. Furthermore, the MPF network correctly classifies common human actions and who performs them.
author2 Xie Lihua
author_facet Xie Lihua
Tang, Shijie
format Thesis-Master by Coursework
author Tang, Shijie
author_sort Tang, Shijie
title WiFi and vision multimodal self-supervised learning for human activity recognition
title_short WiFi and vision multimodal self-supervised learning for human activity recognition
title_full WiFi and vision multimodal self-supervised learning for human activity recognition
title_fullStr WiFi and vision multimodal self-supervised learning for human activity recognition
title_full_unstemmed WiFi and vision multimodal self-supervised learning for human activity recognition
title_sort wifi and vision multimodal self-supervised learning for human activity recognition
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/171660
_version_ 1783955635992264704