Framework for stream clustering of trajectories based on temporal micro clustering technique

In recent years, spatio-temporal data has rapidly increased due to the tremendous prevalence of geolocation devices such as Global Positioning System, mobiles, and motion sensors. In some scenarios, spatio-temporal data are received in a streamed manner. Clustering of the stream data is a challen...

Full description

Saved in:
Bibliographic Details
Main Author: Abdulrazzaq, Musaab Riyadh
Format: Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/76982/1/FSKTM%202018%2069%20-%20IR.pdf
http://psasir.upm.edu.my/id/eprint/76982/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
Description
Summary:In recent years, spatio-temporal data has rapidly increased due to the tremendous prevalence of geolocation devices such as Global Positioning System, mobiles, and motion sensors. In some scenarios, spatio-temporal data are received in a streamed manner. Clustering of the stream data is a challenging task due to single pass over the data, the unbounded size of data stream, and the limited processing time and memory space. However, it is a vital process for many applications such as traffic management, studying of animal behavior, and weather forecasting. Many existing algorithms for stream clustering of trajectory stream data exploit the time window technique which partition trajectory stream data into time-bins and cluster the segments in each timebin separately. Initiating clustering from scratch in each time-bin and not considering the relationships between the objects from two consecutive time-bins lead to creating redundant micro clusters centralizes in the border area between two adjacent time bins. This is because, the clustering process which exploits the time window technique creates new micro clusters for some trajectory segments at the start of each time-bin even though these segments are too close (within distance threshold) to the micro clusters at the end of previous time-bin . It is true that most similar micro clusters at two consecutive time-bins can be merged to reduce memory space but creating redundant micro clusters will dramatically slow down the clustering task. On the other hand, trajectories preprocessing such as segmentation and noise points filtering is a vital step which precedes the mining task. It aims to reduce the size of the trajectory to minimize clustering time. Most of the existing algorithms consider the noise filtering step precedes trajectory segmentation step which mean that there is no preprocessing method to partition trajectory into set of segments and remove noise points in real time with low computational cost. As a response towards the limitations of time window technique and the offline preprocessing methods, a framework for stream clustering of trajectories based on temporal micro clustering technique (SCT-TMC) is proposed. The framework consists of two stages: the preprocessing stage and the stream clustering stage. In the preprocessing stage, the On-Line Noise Filtering Algorithm for Trajectory Segmentation Based on Minimum Description Length concept (ONF_TRS) is proposed to achieve trajectory segmentation and remove noise points in real time with low computational cost. The minimum description length is an important concept in information theory and computational learning in which the best hypothesis for a given data set is the one that leads to the best compression of the data. The stream clustering stage consists of two phases: the online and the offline phases. In the online phase, the stream clustering algorithm for trajectories based on the lifespan of the cluster is proposed (CC_TRS) to overcome the limitations of the time window technique. The clustering algorithm consists of two components: the temporal micro-clusters generation and the temporal micro clusters merging. The temporal micro cluster data structure is proposed in CC_TRS algorithm to store the summarized information for each group of similar segments. The algorithm assigns a life time for each newly created temporal micro clusters so that the new incoming batch of trajectories segments will interact only with the non-expired temporal micro clusters. This is because the expired temporal micro clusters become far from segments at current time either spatially or temporally which in turn, minimizes the searching time for the nearest temporal micro cluster. When the size of the temporal micro clusters is exceeded a given memory space, the most similar temporal micro-clusters will be merged to save memory. On the other hand, the offline phase is evoked when the user requests to view the overall clustering results. The DBSCAN algorithm is used to perform the macro clustering task by replacing the distance between trajectories segments with the distance between the temporal micro-clusters. The DBSCAN is a density based clustering approach which is convenient for stream clustering due to the properties: firstly, it does not require to specify the number of clusters in advance. Secondly, the algorithm can find arbitrarily shaped clusters. Finally, it is robust to noise and outliers. The suggestion of the temporal micro cluster data structure in the online phase leads to the expansion of the functionality of the macro clustering phase. The proposed functions for the offline phase are the detection of spatial and spatiotemporal outliers and the macro clustering. A comprehensive experimental analysis was conducted to evaluate the efficiency and effectiveness of the proposed algorithms (ONF-TRS, CC-TRS). The results shows that the algorithm ONF-TRS has low computational cost and high compression rate as compared with the existing works. Furthermore, the CC-TRS improves and speeds up the clustering task as compared with the latest works.