Towards robust and label-efficient time series representation learning

Time series data is sequential measurements collected over time from various sources in different applications, e.g., healthcare and manufacturing. With the increased generation of time series data from these applications, their analysis is becoming more important to get insights. Deep learning has...

Full description

Saved in:
Bibliographic Details
Main Author: Emadeldeen Ahmed Ibrahim Ahmed Eldele
Other Authors: Kwoh Chee Keong
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/170673
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-170673
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Emadeldeen Ahmed Ibrahim Ahmed Eldele
Towards robust and label-efficient time series representation learning
description Time series data is sequential measurements collected over time from various sources in different applications, e.g., healthcare and manufacturing. With the increased generation of time series data from these applications, their analysis is becoming more important to get insights. Deep learning has shown a potential and proven capability in automatic learning from massive data, by identifying complex patterns and representations directly from data. However, the current deep learning-based models suffer significant limitations. First, they lack the ability to efficiently learn time series temporal relations while utilizing parallel processing. Second, these models require large amounts of labeled data for training, which can be difficult to obtain, especially with complex time series data. Third, the generalization capability of these models is limited, where they suffer performance deterioration when transferring knowledge from a labeled source domain to an out-of-distribution unlabeled target domain. In this thesis, we address these problems and provide solutions for the real-world deployment of deep learning models for time series data. We first propose a novel attention-based deep learning architecture called AttnSleep to classify EEG-based sleep stages, being one of the common time series healthcare data types. Specifically, we propose a powerful feature extractor that learns from different frequency bands in EEG signals. We also propose a temporal context encoder module to learn the temporal dependencies among extracted features using a causal multi-head attention mechanism. Last, we develop a class-aware loss function to address the class-imbalance problem in sleep data without incurring any additional computational costs. Next, we propose two frameworks to address the label scarcity problem in different settings. The first framework, TS-TCC, is a self-supervised learning approach that learns useful representations from unlabeled data. TS-TCC utilizes time series-specific augmentations to generate two views for each sample. We then learn the temporal representations via a novel cross-view temporal prediction task. Furthermore, we propose a contextual contrasting module that further learns discriminative representations. The second framework, CA-TCC, improves the learned representations from TS-TCC in semi-supervised settings, by training the model in four phases. First, we perform self-supervised training with TS-TCC. Then, we fine-tune the pretrained model with the available few labeled samples. Following that, we use the fine-tuned model to assign pseudo labels to the unlabeled set. Finally, we leverage these pseudo labels to realize a class-aware contrastive loss for semi-supervised training. These two frameworks showed significant performance improvement with having few labeled samples compared to traditional supervised training. Last, we tackle the domain shift problem and propose two novel frameworks to address this issue. In the first framework, we introduce an adversarial domain adaptation technique named ADAST, that addresses two challenges, namely the loss of domain-specific information during feature extraction and the ignorance of class information in the target domain during domain alignment. To overcome these challenges, we incorporate an unshared attention mechanism and an iterative self-training strategy with dual distinct classifiers. In the second framework, we attempt to overcome the complexity of adversarial training and present a novel approach called CoTMix to address the domain shift with a simple yet effective contrastive learning strategy. In specific, we propose a cross-domain temporal mixup strategy to create source-dominant and target-dominant domains. These domains serve as augmented views for the source and target domains in contrastive learning. Unlike prior works, CoTMix maps the source and target domains to an intermediate domain. These frameworks showed improved robustness of deep learning models on time series data.
author2 Kwoh Chee Keong
author_facet Kwoh Chee Keong
Emadeldeen Ahmed Ibrahim Ahmed Eldele
format Thesis-Doctor of Philosophy
author Emadeldeen Ahmed Ibrahim Ahmed Eldele
author_sort Emadeldeen Ahmed Ibrahim Ahmed Eldele
title Towards robust and label-efficient time series representation learning
title_short Towards robust and label-efficient time series representation learning
title_full Towards robust and label-efficient time series representation learning
title_fullStr Towards robust and label-efficient time series representation learning
title_full_unstemmed Towards robust and label-efficient time series representation learning
title_sort towards robust and label-efficient time series representation learning
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/170673
_version_ 1779171086505607168
spelling sg-ntu-dr.10356-1706732023-10-03T09:52:45Z Towards robust and label-efficient time series representation learning Emadeldeen Ahmed Ibrahim Ahmed Eldele Kwoh Chee Keong School of Computer Science and Engineering Agency for Science, Technology and Research (A*STAR) ASCKKWOH@ntu.edu.sg Engineering::Computer science and engineering Time series data is sequential measurements collected over time from various sources in different applications, e.g., healthcare and manufacturing. With the increased generation of time series data from these applications, their analysis is becoming more important to get insights. Deep learning has shown a potential and proven capability in automatic learning from massive data, by identifying complex patterns and representations directly from data. However, the current deep learning-based models suffer significant limitations. First, they lack the ability to efficiently learn time series temporal relations while utilizing parallel processing. Second, these models require large amounts of labeled data for training, which can be difficult to obtain, especially with complex time series data. Third, the generalization capability of these models is limited, where they suffer performance deterioration when transferring knowledge from a labeled source domain to an out-of-distribution unlabeled target domain. In this thesis, we address these problems and provide solutions for the real-world deployment of deep learning models for time series data. We first propose a novel attention-based deep learning architecture called AttnSleep to classify EEG-based sleep stages, being one of the common time series healthcare data types. Specifically, we propose a powerful feature extractor that learns from different frequency bands in EEG signals. We also propose a temporal context encoder module to learn the temporal dependencies among extracted features using a causal multi-head attention mechanism. Last, we develop a class-aware loss function to address the class-imbalance problem in sleep data without incurring any additional computational costs. Next, we propose two frameworks to address the label scarcity problem in different settings. The first framework, TS-TCC, is a self-supervised learning approach that learns useful representations from unlabeled data. TS-TCC utilizes time series-specific augmentations to generate two views for each sample. We then learn the temporal representations via a novel cross-view temporal prediction task. Furthermore, we propose a contextual contrasting module that further learns discriminative representations. The second framework, CA-TCC, improves the learned representations from TS-TCC in semi-supervised settings, by training the model in four phases. First, we perform self-supervised training with TS-TCC. Then, we fine-tune the pretrained model with the available few labeled samples. Following that, we use the fine-tuned model to assign pseudo labels to the unlabeled set. Finally, we leverage these pseudo labels to realize a class-aware contrastive loss for semi-supervised training. These two frameworks showed significant performance improvement with having few labeled samples compared to traditional supervised training. Last, we tackle the domain shift problem and propose two novel frameworks to address this issue. In the first framework, we introduce an adversarial domain adaptation technique named ADAST, that addresses two challenges, namely the loss of domain-specific information during feature extraction and the ignorance of class information in the target domain during domain alignment. To overcome these challenges, we incorporate an unshared attention mechanism and an iterative self-training strategy with dual distinct classifiers. In the second framework, we attempt to overcome the complexity of adversarial training and present a novel approach called CoTMix to address the domain shift with a simple yet effective contrastive learning strategy. In specific, we propose a cross-domain temporal mixup strategy to create source-dominant and target-dominant domains. These domains serve as augmented views for the source and target domains in contrastive learning. Unlike prior works, CoTMix maps the source and target domains to an intermediate domain. These frameworks showed improved robustness of deep learning models on time series data. Doctor of Philosophy 2023-09-26T01:49:23Z 2023-09-26T01:49:23Z 2023 Thesis-Doctor of Philosophy Emadeldeen Ahmed Ibrahim Ahmed Eldele (2023). Towards robust and label-efficient time series representation learning. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/170673 https://hdl.handle.net/10356/170673 10.32657/10356/170673 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University