Towards robust sensing and recognition : from statistical learning to transfer learning

Sensing and recognition technologies play an essential role in smart cities. Big data is acquired by various sensors deployed in almost every corner of our daily lives, which quantifies human activities and social systems. Smart sensing significantly facilitates people's lives and promotes the...

Full description

Saved in:
Bibliographic Details
Main Author: Yang, Jianfei
Other Authors: Xie Lihua
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/145444
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Sensing and recognition technologies play an essential role in smart cities. Big data is acquired by various sensors deployed in almost every corner of our daily lives, which quantifies human activities and social systems. Smart sensing significantly facilitates people's lives and promotes the development of smart cities. Such success is due to not only the sufficient data by a variety of sensors but also the development of recognition technology driven by the progressive machine learning. As different sensors offer heterogeneous data, traditional statistical models may not be capable of tackling the high-dimensional data. To this end, neural networks with high fitting ability were proposed, which further evolved to deep neural networks as the neural layers become deeper and deeper for better capacity of feature extraction. Using deep learning, data of distinct sensors can be learned jointly for accurate recognition. Nevertheless, towards robust smart sensing and recognition technology, there are still some bottlenecks that hinder the sensing performance and restrain the application scenarios. In this thesis, we propose systemic solutions and algorithms to overcome the longstanding challenges from two perspectives: sensing tools and recognition models. From the perspective of sensing tools, different sensors are applied to their suitable scenarios. For instance, inertial Measurement Unit (IMU) is utilized for mobile devices and cameras take charge of surveillance. However, in smart homes, existing sensors have their intrinsic shortcomings such as the privacy issue of camera, the requirement of carrying devices for IMU and low granularity of data for Pyroelectric Infrared Sensor (PIR). To obtain a cost-effective, privacy-preserving and fine-grained sensing tools, this thesis pioneers a WiFi-based sensing platform which extracts Channel State Information (CSI) data directly from routers instead of specific network cards. CSI describes the propagation situation of wireless signals which can be affected by human motions. Therefore, these patterns of variant signals render WiFi-based sensing to be possible. The proposed platform is an IoT-enabled platform which leverages off-the-shelf (COTS) WiFi routers for sensing. To demonstrate its effectiveness, we develop an occupancy detection system and a sedentary behavior monitoring system, and evaluate their performance in real-world scenarios. Traditional statistical features are employed for static and dynamic activity recognition. We further propose a novel approach for a more fine-grained recognition application --- WiFi-based gesture recognition using the platform. Since gestures are too subtle to capture, many environmental noises are recorded in CSI samples. Extracting effective features for recognition from noisy data is a non-trivial task using deep neural networks. To this end, we propose a Siamese deep model with both spatial and temporal feature extractors, which discards the intrinsic noises of CSI data during feature learning. The proposed method also allows user to fine-tune the system using few samples, and thus is user-friendly. From the perspective of recognition, this thesis focuses on dealing with the domain shift problem that hinders the generalization of recognition models. For instance, in WiFi-based activity recognition, the model trained in one environment cannot generalize well in a new environment due to the spatial shift existing in different environments. In visual sensing, images are captured by kinds of cameras with different brands and parameters, which hinders the performance of visual recognition models even more severely. To deal with these problems, domain adaptation minimizes the distribution discrepancy between source domain (e.g. the original environment for training) and target domain (e.g. the new environment for testing). This thesis contributes to four novel algorithms for real-world domain adaptation scenarios. Firstly, we formulate the Edge Domain Adaptation (EDA) problem that addresses the domain shift between a model trained in center server and the model running in edge devices. Then MobileDA is proposed to deal with this problem by joint minimization of domain disparity and model complexity. It bridges the gap between domain adaptation algorithms and edge computing. Secondly, current domain adaptation methods require careful manual hyper-parameter tuning, which is not realistic for unmanned systems such as smart cars. To stabilize the training of adversarial domain adaptation, Max-margin Domain-Adversarial Training (MDAT) is developed in this thesis to realize stable convergence without cumbersome manuals. Thirdly, regarding the cross-domain feature learning, we further study the transferability and discriminability of deep features generated by domain adaptation models, which achieves a better performance due to the preservation of feature discriminability. This motivates us to propose Asymmetric Adversarial Domain Adaptation (AADA) that upgrades MDAT to a more generic framework. AADA also contributes to explaining the underlying mechanism behind MDAT. Fourthly, Imbalanced Domain Adaptation (IDA) is proposed for the real-world imbalanced data since label shift commonly occurs to realistic data distribution. As the source domain and target domain may have imbalanced classes, IDA refers to the situation when covariate shift and label shift exist simultaneously. Then the proposed Cluster-level Discrepancy Minimization (CDM) conducts domain alignment after unsupervised clustering, which effectively aligns the domain shift and the label shift. Four algorithms for domain adaptation have been evaluated on public visual benchmark (e.g. Image-CLEF and Office-Home) and WiFi-based gesture recognition dataset. They achieve state-of-the-art performances as well as the specific goals such as training stability and better discriminant features. In summary, the thesis contributes to robust sensing and recognition technology. For smart sensing, the WiFi-based sensing platform provides fine-grained CSI data, which can empower many applications including human activity recognition, gesture recognition, person identification, crowd counting and indoor localization. WiFI-based sensing is necessarily complementary to seamless sensing in smart homes or buildings. For robust recognition models, four domain adaptation algorithms are proposed in the thesis. They greatly give impetus to dealing with real-world domain adaptation scenarios such as long-tailed data or automatic model adaptation in edge devices. These algorithms can be further applied to improving the generalization abilities of deep models for WiFi-based sensing and visual sensing including object recognition, object detection, sentiment segmentation and person re-identification.