An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning

Data stream clustering plays an important role in data stream mining for knowledge extraction. In recent years, numerous researchers have studied the online density-based clustering technique due to its capability to generate arbitrarily shaped clusters. The technique summarizes the data stream in m...

Full description

Saved in:
Bibliographic Details
Main Author: Islam, Md Kamrul
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/31091/1/An%20online%20density-based%20clustering%20algorithm%20for%20data%20stream%20based%20on%20local%20optimal%20radius%20and%20cluster%20pruning.wm.pdf
http://umpir.ump.edu.my/id/eprint/31091/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang
Language: English
id my.ump.umpir.31091
record_format eprints
spelling my.ump.umpir.310912023-03-27T02:40:20Z http://umpir.ump.edu.my/id/eprint/31091/ An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning Islam, Md Kamrul QA75 Electronic computers. Computer science Data stream clustering plays an important role in data stream mining for knowledge extraction. In recent years, numerous researchers have studied the online density-based clustering technique due to its capability to generate arbitrarily shaped clusters. The technique summarizes the data stream in micro-clusters and the micro-clusters form the clusters. However, most of the clusters are either not fully online, or cannot handle the properties of data stream properly. Moreover, the algorithms require predefining the global optimal radius of micro-clusters, which is a difficult task, and an erroneous choice deteriorates the cluster quality. In addition, the algorithms ignore the presence of temporarily irrelevant micro-clusters, which may be relevant in the future. This ignorance causes the degradation of clustering quality and the increase of the processing time as micro-clusters are deleted and created frequently due to evolving nature of data stream. In this study, a fully online density-based clustering algorithm called Buffer-based Online Clustering for Evolving Data Stream (BOCEDS) is presented. BOCEDS clusters the data stream in a single stage. The algorithm summarizes the data from data stream in micro-clusters. This algorithm maintains the local optimal radius of micro-clusters rather than a global and constant radius. Moreover, it introduces a buffer for storing irrelevant micro-clusters and a fully online pruning process for extracting the temporarily irrelevant micro-cluster from the buffer. The pruning process improves processing time. In addition, BOCEDS proposes an online micro-cluster energy updating function based on the spatial information of the data stream. Then, clustering graphs are generated based on the connectivity among micro-clusters. The clusters are generated from the clustering graphs. To evaluate the performance, BOCEDS algorithm is executed on two syntactic and one practical data streams. The experimental result shows BOCEDS is able to generate new clusters and remove outdated clusters with time as data stream contents change. The experiment on noisy data stream shows that BOCEDS algorithm can detect noise with an accuracy of approximately 100%. The overall clustering accuracy and purity are more than 99%. Experimental results are compared with other alternative online/offline hybrid density-based clustering algorithms. The average processing time for data point in the data stream is about 2 milliseconds which is much lower than the aligned clustering algorithms in literature. The algorithm is also more scalable to high dimensional data stream than the existing algorithms. The sensitivity of clustering parameters in BOCEDS is also measured. The result shows that in case of changing the values of parameters the cluster quality deviates by a very small amount (<1%). These results prove the superiority of BOCEDS algorithm over the existing clustering algorithms. The BOCEDS algorithm is then applied to real-world weather data streams to demonstrate its capability to detect the drifts in the data stream and discover arbitrarily shaped clusters. 2019-10 Thesis NonPeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/31091/1/An%20online%20density-based%20clustering%20algorithm%20for%20data%20stream%20based%20on%20local%20optimal%20radius%20and%20cluster%20pruning.wm.pdf Islam, Md Kamrul (2019) An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning. Masters thesis, Universiti Malaysia Pahang (Contributors, Thesis advisor: Zamli, Kamal Zuhairi).
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Islam, Md Kamrul
An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
description Data stream clustering plays an important role in data stream mining for knowledge extraction. In recent years, numerous researchers have studied the online density-based clustering technique due to its capability to generate arbitrarily shaped clusters. The technique summarizes the data stream in micro-clusters and the micro-clusters form the clusters. However, most of the clusters are either not fully online, or cannot handle the properties of data stream properly. Moreover, the algorithms require predefining the global optimal radius of micro-clusters, which is a difficult task, and an erroneous choice deteriorates the cluster quality. In addition, the algorithms ignore the presence of temporarily irrelevant micro-clusters, which may be relevant in the future. This ignorance causes the degradation of clustering quality and the increase of the processing time as micro-clusters are deleted and created frequently due to evolving nature of data stream. In this study, a fully online density-based clustering algorithm called Buffer-based Online Clustering for Evolving Data Stream (BOCEDS) is presented. BOCEDS clusters the data stream in a single stage. The algorithm summarizes the data from data stream in micro-clusters. This algorithm maintains the local optimal radius of micro-clusters rather than a global and constant radius. Moreover, it introduces a buffer for storing irrelevant micro-clusters and a fully online pruning process for extracting the temporarily irrelevant micro-cluster from the buffer. The pruning process improves processing time. In addition, BOCEDS proposes an online micro-cluster energy updating function based on the spatial information of the data stream. Then, clustering graphs are generated based on the connectivity among micro-clusters. The clusters are generated from the clustering graphs. To evaluate the performance, BOCEDS algorithm is executed on two syntactic and one practical data streams. The experimental result shows BOCEDS is able to generate new clusters and remove outdated clusters with time as data stream contents change. The experiment on noisy data stream shows that BOCEDS algorithm can detect noise with an accuracy of approximately 100%. The overall clustering accuracy and purity are more than 99%. Experimental results are compared with other alternative online/offline hybrid density-based clustering algorithms. The average processing time for data point in the data stream is about 2 milliseconds which is much lower than the aligned clustering algorithms in literature. The algorithm is also more scalable to high dimensional data stream than the existing algorithms. The sensitivity of clustering parameters in BOCEDS is also measured. The result shows that in case of changing the values of parameters the cluster quality deviates by a very small amount (<1%). These results prove the superiority of BOCEDS algorithm over the existing clustering algorithms. The BOCEDS algorithm is then applied to real-world weather data streams to demonstrate its capability to detect the drifts in the data stream and discover arbitrarily shaped clusters.
format Thesis
author Islam, Md Kamrul
author_facet Islam, Md Kamrul
author_sort Islam, Md Kamrul
title An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
title_short An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
title_full An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
title_fullStr An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
title_full_unstemmed An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
title_sort online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
publishDate 2019
url http://umpir.ump.edu.my/id/eprint/31091/1/An%20online%20density-based%20clustering%20algorithm%20for%20data%20stream%20based%20on%20local%20optimal%20radius%20and%20cluster%20pruning.wm.pdf
http://umpir.ump.edu.my/id/eprint/31091/
_version_ 1761616638710382592