Machine learning-based approaches for large-scale temporal data analytics
In many domains such as telecommunications, finance and sensor monitoring, large volumes of unlabeled temporal data are continuously generated in a sequential format, where the timestamps of the generated records are available. From a data analysis standpoint, there is significant utility to be gai...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/105859 http://hdl.handle.net/10220/47870 https://doi.org/10.32657/10220/47870 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In many domains such as telecommunications, finance and sensor monitoring, large volumes
of unlabeled temporal data are continuously generated in a sequential format, where the timestamps of the generated records are available. From a data analysis standpoint, there is significant utility to be gained by discovering the patterns in the data, in order to understand the
trend of changes in the past and future. In this study, by considering different types of changes
that occur in temporal data, including changes in underlying data distribution, recurring concepts
and outliers, different machine learning-based models are proposed to track the changes,
predict future values and detect abnormal patterns in temporal settings.
The phenomenon of concept drift refers to the changes in the underlying data distribution
over time. Many studies have been conducted to handle the concept drift problem in supervised
settings, where the changes in the underlying data distribution can adversely affect supervised
models. However, limited work has been done from an unsupervised point of view, in order
to understand the changes in data distribution in an interpretable way. In our first study, a
scalable optimization model is proposed to track multiple concepts and participation of actors
in each concept over time, in order to track the changes in underlying distribution of data.
The proposed concept tracking method applies to problem settings that cannot be handled
by existing concept drift and stream mining methods, and outperforms popular unsupervised
baselines from the wider Data Mining and Machine Learning literature.
Recurring concepts is a form of concept drift, where a previously observed pattern recurs
in the data after some time. Understanding and utilizing such patterns are beneficial, especially
in forecasting problem. In some spatio-temporal settings such as crowd density in urban
environments, recurring periodic patterns can be observed, which have not been considered
explicitly in previous works on spatio-temporal forecasting. To address this issue, a novel deep
learning based method is proposed in our second study, to accurately capture spatial and temporal
correlations in geo-spatial data. The proposed model is able to learn and incorporate
explicit periodic representations in different temporal scales, and can be optimized with multistep
ahead prediction. By conducting experiments on two real world taxi datasets, we showed
that the proposed prediction model, enhanced with periodic representations, outperforms other
linear and non-linear prediction baselines.
Outliers are another type of change in temporal data. Current approaches for outlier detection
in spatio-temporal settings use reconstruction-based or representation learning-based approaches
to learn the normal pattern of data and detect instances which deviate from the normal pattern (global anomalies). These approaches fail to detect temporal anomaly instances, which
are similar to normal instances, but occur in an unusual time context. To address this problem,
a temporal anomaly detection model is proposed in our last study. The proposed model can
effectively capture spatial and temporal dependencies in spatio-temporal sequences and detect
abnormal crowd density patterns in geo-spatial domain, using temporal meta-data prediction
error. The conducted experiments that used real world taxi datasets, shows the effectiveness
of the proposed model in detecting abnormal crowd density patterns in geo-spatio-temporal
environments, that cannot be detected by the existing reconstruction-based anomaly detection
methods. |
---|