Machine learning-based approaches for large-scale temporal data analytics

In many domains such as telecommunications, finance and sensor monitoring, large volumes of unlabeled temporal data are continuously generated in a sequential format, where the timestamps of the generated records are available. From a data analysis standpoint, there is significant utility to be gai...

Full description

Saved in:
Bibliographic Details
Main Author: Seyed Ali Majid Zonoozi
Other Authors: Cong Gao
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/105859
http://hdl.handle.net/10220/47870
https://doi.org/10.32657/10220/47870
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-105859
record_format dspace
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Mathematics of computing::Mathematical software
DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles
spellingShingle DRNTU::Engineering::Computer science and engineering::Mathematics of computing::Mathematical software
DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles
Seyed Ali Majid Zonoozi
Machine learning-based approaches for large-scale temporal data analytics
description In many domains such as telecommunications, finance and sensor monitoring, large volumes of unlabeled temporal data are continuously generated in a sequential format, where the timestamps of the generated records are available. From a data analysis standpoint, there is significant utility to be gained by discovering the patterns in the data, in order to understand the trend of changes in the past and future. In this study, by considering different types of changes that occur in temporal data, including changes in underlying data distribution, recurring concepts and outliers, different machine learning-based models are proposed to track the changes, predict future values and detect abnormal patterns in temporal settings. The phenomenon of concept drift refers to the changes in the underlying data distribution over time. Many studies have been conducted to handle the concept drift problem in supervised settings, where the changes in the underlying data distribution can adversely affect supervised models. However, limited work has been done from an unsupervised point of view, in order to understand the changes in data distribution in an interpretable way. In our first study, a scalable optimization model is proposed to track multiple concepts and participation of actors in each concept over time, in order to track the changes in underlying distribution of data. The proposed concept tracking method applies to problem settings that cannot be handled by existing concept drift and stream mining methods, and outperforms popular unsupervised baselines from the wider Data Mining and Machine Learning literature. Recurring concepts is a form of concept drift, where a previously observed pattern recurs in the data after some time. Understanding and utilizing such patterns are beneficial, especially in forecasting problem. In some spatio-temporal settings such as crowd density in urban environments, recurring periodic patterns can be observed, which have not been considered explicitly in previous works on spatio-temporal forecasting. To address this issue, a novel deep learning based method is proposed in our second study, to accurately capture spatial and temporal correlations in geo-spatial data. The proposed model is able to learn and incorporate explicit periodic representations in different temporal scales, and can be optimized with multistep ahead prediction. By conducting experiments on two real world taxi datasets, we showed that the proposed prediction model, enhanced with periodic representations, outperforms other linear and non-linear prediction baselines. Outliers are another type of change in temporal data. Current approaches for outlier detection in spatio-temporal settings use reconstruction-based or representation learning-based approaches to learn the normal pattern of data and detect instances which deviate from the normal pattern (global anomalies). These approaches fail to detect temporal anomaly instances, which are similar to normal instances, but occur in an unusual time context. To address this problem, a temporal anomaly detection model is proposed in our last study. The proposed model can effectively capture spatial and temporal dependencies in spatio-temporal sequences and detect abnormal crowd density patterns in geo-spatial domain, using temporal meta-data prediction error. The conducted experiments that used real world taxi datasets, shows the effectiveness of the proposed model in detecting abnormal crowd density patterns in geo-spatio-temporal environments, that cannot be detected by the existing reconstruction-based anomaly detection methods.
author2 Cong Gao
author_facet Cong Gao
Seyed Ali Majid Zonoozi
format Theses and Dissertations
author Seyed Ali Majid Zonoozi
author_sort Seyed Ali Majid Zonoozi
title Machine learning-based approaches for large-scale temporal data analytics
title_short Machine learning-based approaches for large-scale temporal data analytics
title_full Machine learning-based approaches for large-scale temporal data analytics
title_fullStr Machine learning-based approaches for large-scale temporal data analytics
title_full_unstemmed Machine learning-based approaches for large-scale temporal data analytics
title_sort machine learning-based approaches for large-scale temporal data analytics
publishDate 2019
url https://hdl.handle.net/10356/105859
http://hdl.handle.net/10220/47870
https://doi.org/10.32657/10220/47870
_version_ 1681049581072154624
spelling sg-ntu-dr.10356-1058592019-12-06T21:59:24Z Machine learning-based approaches for large-scale temporal data analytics Seyed Ali Majid Zonoozi Cong Gao School of Computer Science and Engineering A*STAR Institute for Infocomm Research DRNTU::Engineering::Computer science and engineering::Mathematics of computing::Mathematical software DRNTU::Engineering::Computer science and engineering::Information systems::Models and principles In many domains such as telecommunications, finance and sensor monitoring, large volumes of unlabeled temporal data are continuously generated in a sequential format, where the timestamps of the generated records are available. From a data analysis standpoint, there is significant utility to be gained by discovering the patterns in the data, in order to understand the trend of changes in the past and future. In this study, by considering different types of changes that occur in temporal data, including changes in underlying data distribution, recurring concepts and outliers, different machine learning-based models are proposed to track the changes, predict future values and detect abnormal patterns in temporal settings. The phenomenon of concept drift refers to the changes in the underlying data distribution over time. Many studies have been conducted to handle the concept drift problem in supervised settings, where the changes in the underlying data distribution can adversely affect supervised models. However, limited work has been done from an unsupervised point of view, in order to understand the changes in data distribution in an interpretable way. In our first study, a scalable optimization model is proposed to track multiple concepts and participation of actors in each concept over time, in order to track the changes in underlying distribution of data. The proposed concept tracking method applies to problem settings that cannot be handled by existing concept drift and stream mining methods, and outperforms popular unsupervised baselines from the wider Data Mining and Machine Learning literature. Recurring concepts is a form of concept drift, where a previously observed pattern recurs in the data after some time. Understanding and utilizing such patterns are beneficial, especially in forecasting problem. In some spatio-temporal settings such as crowd density in urban environments, recurring periodic patterns can be observed, which have not been considered explicitly in previous works on spatio-temporal forecasting. To address this issue, a novel deep learning based method is proposed in our second study, to accurately capture spatial and temporal correlations in geo-spatial data. The proposed model is able to learn and incorporate explicit periodic representations in different temporal scales, and can be optimized with multistep ahead prediction. By conducting experiments on two real world taxi datasets, we showed that the proposed prediction model, enhanced with periodic representations, outperforms other linear and non-linear prediction baselines. Outliers are another type of change in temporal data. Current approaches for outlier detection in spatio-temporal settings use reconstruction-based or representation learning-based approaches to learn the normal pattern of data and detect instances which deviate from the normal pattern (global anomalies). These approaches fail to detect temporal anomaly instances, which are similar to normal instances, but occur in an unusual time context. To address this problem, a temporal anomaly detection model is proposed in our last study. The proposed model can effectively capture spatial and temporal dependencies in spatio-temporal sequences and detect abnormal crowd density patterns in geo-spatial domain, using temporal meta-data prediction error. The conducted experiments that used real world taxi datasets, shows the effectiveness of the proposed model in detecting abnormal crowd density patterns in geo-spatio-temporal environments, that cannot be detected by the existing reconstruction-based anomaly detection methods. Doctor of Philosophy 2019-03-20T14:10:54Z 2019-12-06T21:59:24Z 2019-03-20T14:10:54Z 2019-12-06T21:59:24Z 2018 Thesis Seyed Ali Majid Zonoozi. (2018). Machine learning-based approaches for large-scale temporal data analytics. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/105859 http://hdl.handle.net/10220/47870 https://doi.org/10.32657/10220/47870 en 127 p. application/pdf