Machine learning-based approaches for large-scale temporal data analytics

In many domains such as telecommunications, finance and sensor monitoring, large volumes of unlabeled temporal data are continuously generated in a sequential format, where the timestamps of the generated records are available. From a data analysis standpoint, there is significant utility to be gai...

Full description

Saved in:
Bibliographic Details
Main Author: Seyed Ali Majid Zonoozi
Other Authors: Cong Gao
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/105859
http://hdl.handle.net/10220/47870
https://doi.org/10.32657/10220/47870
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In many domains such as telecommunications, finance and sensor monitoring, large volumes of unlabeled temporal data are continuously generated in a sequential format, where the timestamps of the generated records are available. From a data analysis standpoint, there is significant utility to be gained by discovering the patterns in the data, in order to understand the trend of changes in the past and future. In this study, by considering different types of changes that occur in temporal data, including changes in underlying data distribution, recurring concepts and outliers, different machine learning-based models are proposed to track the changes, predict future values and detect abnormal patterns in temporal settings. The phenomenon of concept drift refers to the changes in the underlying data distribution over time. Many studies have been conducted to handle the concept drift problem in supervised settings, where the changes in the underlying data distribution can adversely affect supervised models. However, limited work has been done from an unsupervised point of view, in order to understand the changes in data distribution in an interpretable way. In our first study, a scalable optimization model is proposed to track multiple concepts and participation of actors in each concept over time, in order to track the changes in underlying distribution of data. The proposed concept tracking method applies to problem settings that cannot be handled by existing concept drift and stream mining methods, and outperforms popular unsupervised baselines from the wider Data Mining and Machine Learning literature. Recurring concepts is a form of concept drift, where a previously observed pattern recurs in the data after some time. Understanding and utilizing such patterns are beneficial, especially in forecasting problem. In some spatio-temporal settings such as crowd density in urban environments, recurring periodic patterns can be observed, which have not been considered explicitly in previous works on spatio-temporal forecasting. To address this issue, a novel deep learning based method is proposed in our second study, to accurately capture spatial and temporal correlations in geo-spatial data. The proposed model is able to learn and incorporate explicit periodic representations in different temporal scales, and can be optimized with multistep ahead prediction. By conducting experiments on two real world taxi datasets, we showed that the proposed prediction model, enhanced with periodic representations, outperforms other linear and non-linear prediction baselines. Outliers are another type of change in temporal data. Current approaches for outlier detection in spatio-temporal settings use reconstruction-based or representation learning-based approaches to learn the normal pattern of data and detect instances which deviate from the normal pattern (global anomalies). These approaches fail to detect temporal anomaly instances, which are similar to normal instances, but occur in an unusual time context. To address this problem, a temporal anomaly detection model is proposed in our last study. The proposed model can effectively capture spatial and temporal dependencies in spatio-temporal sequences and detect abnormal crowd density patterns in geo-spatial domain, using temporal meta-data prediction error. The conducted experiments that used real world taxi datasets, shows the effectiveness of the proposed model in detecting abnormal crowd density patterns in geo-spatio-temporal environments, that cannot be detected by the existing reconstruction-based anomaly detection methods.