Differentiable generative models for trajectory data analytics
With the proliferation of GPS-enabled devices, trajectory data is being generated at an unprecedented speed. The trajectories are typically represented as sequences of discrete sample points, which carry rich spatiotemporal information. Mining patterns and distilling knowledge from such a large amou...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/137159 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | With the proliferation of GPS-enabled devices, trajectory data is being generated at an unprecedented speed. The trajectories are typically represented as sequences of discrete sample points, which carry rich spatiotemporal information. Mining patterns and distilling knowledge from such a large amount of trajectory data could potentially help address many real-world problems and improve our daily life experience. For instance, accurately and efficiently quantifying the similarity between two trajectories is a foundation for many trajectory based applications, such as tracking migration patterns of animals, mining hot routes in cities, trajectory clustering and moving group discovery.
In this dissertation, we seek to effectively and efficiently distill knowledge from trajectory data with differentiable generative models. We develop three flexible generative models with efficiency in mind. The resulting models not only are capable of revealing the useful patterns underlying the data, but also admit end-to-end training. Moreover, our methods scale to real-world large-scale trajectory datasets easily. Specifically, we explore three important research problems arising in big trajectory data analytics: 1) learning representation for trajectory similarity computation; 2) learning the travel time distribution for any route on the road network; 3) spatial transition learning on the road network.
In the study of trajectory representation learning, we propose the first deep learning approach – t2vec – to learning representations of trajectories that is robust to low data quality, thus supporting accurate and efficient trajectory similarity computation and search. Experiments show that our method is capable of higher accuracy and is at least one order of magnitude faster than the state-of-the-art methods for k-nearest trajectory search.
In the study of travel time distribution learning, we develop a novel deep generative model – DeepGTT – to learn the travel time distribution for any route on the route network by conditioning on the real-time traffic. DeepGTT interprets the generation of travel time using a three-layers hierarchical probabilistic model, and describes the generation process in a reasonable manner rather than simply learning by brute force, and thus it not only produces more accurate results but also is quite data-efficient. A variational loss is further derived and the entire model is fully differentiable, which makes the model easily scale to large data sets.
In the study of spatial transition learning on the road network, we present a novel deep probabilistic model – DeepST – which unifies three explanatory factors, the past traveled route, the impact of destination and real-time traffic for the route decision. DeepST explains the generation of next road link by conditioning on the representations of the three explanatory factors. To enable effectively sharing the statistical strength, we propose to learn representations of k-destination proxies with an adjoint generative model. To incorporate the impact of real-time traffic, we introduce a high-dimensional latent variable as its representation whose posterior distribution can then be inferred from observations. An efficient inference method is developed within the Variational Auto-Encoders framework to scale DeepST to large-scale data sets. |
---|