An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning
Data stream clustering plays an important role in data stream mining for knowledge extraction. In recent years, numerous researchers have studied the online density-based clustering technique due to its capability to generate arbitrarily shaped clusters. The technique summarizes the data stream in m...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://umpir.ump.edu.my/id/eprint/31091/1/An%20online%20density-based%20clustering%20algorithm%20for%20data%20stream%20based%20on%20local%20optimal%20radius%20and%20cluster%20pruning.wm.pdf http://umpir.ump.edu.my/id/eprint/31091/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaysia Pahang |
Language: | English |
id |
my.ump.umpir.31091 |
---|---|
record_format |
eprints |
spelling |
my.ump.umpir.310912023-03-27T02:40:20Z http://umpir.ump.edu.my/id/eprint/31091/ An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning Islam, Md Kamrul QA75 Electronic computers. Computer science Data stream clustering plays an important role in data stream mining for knowledge extraction. In recent years, numerous researchers have studied the online density-based clustering technique due to its capability to generate arbitrarily shaped clusters. The technique summarizes the data stream in micro-clusters and the micro-clusters form the clusters. However, most of the clusters are either not fully online, or cannot handle the properties of data stream properly. Moreover, the algorithms require predefining the global optimal radius of micro-clusters, which is a difficult task, and an erroneous choice deteriorates the cluster quality. In addition, the algorithms ignore the presence of temporarily irrelevant micro-clusters, which may be relevant in the future. This ignorance causes the degradation of clustering quality and the increase of the processing time as micro-clusters are deleted and created frequently due to evolving nature of data stream. In this study, a fully online density-based clustering algorithm called Buffer-based Online Clustering for Evolving Data Stream (BOCEDS) is presented. BOCEDS clusters the data stream in a single stage. The algorithm summarizes the data from data stream in micro-clusters. This algorithm maintains the local optimal radius of micro-clusters rather than a global and constant radius. Moreover, it introduces a buffer for storing irrelevant micro-clusters and a fully online pruning process for extracting the temporarily irrelevant micro-cluster from the buffer. The pruning process improves processing time. In addition, BOCEDS proposes an online micro-cluster energy updating function based on the spatial information of the data stream. Then, clustering graphs are generated based on the connectivity among micro-clusters. The clusters are generated from the clustering graphs. To evaluate the performance, BOCEDS algorithm is executed on two syntactic and one practical data streams. The experimental result shows BOCEDS is able to generate new clusters and remove outdated clusters with time as data stream contents change. The experiment on noisy data stream shows that BOCEDS algorithm can detect noise with an accuracy of approximately 100%. The overall clustering accuracy and purity are more than 99%. Experimental results are compared with other alternative online/offline hybrid density-based clustering algorithms. The average processing time for data point in the data stream is about 2 milliseconds which is much lower than the aligned clustering algorithms in literature. The algorithm is also more scalable to high dimensional data stream than the existing algorithms. The sensitivity of clustering parameters in BOCEDS is also measured. The result shows that in case of changing the values of parameters the cluster quality deviates by a very small amount (<1%). These results prove the superiority of BOCEDS algorithm over the existing clustering algorithms. The BOCEDS algorithm is then applied to real-world weather data streams to demonstrate its capability to detect the drifts in the data stream and discover arbitrarily shaped clusters. 2019-10 Thesis NonPeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/31091/1/An%20online%20density-based%20clustering%20algorithm%20for%20data%20stream%20based%20on%20local%20optimal%20radius%20and%20cluster%20pruning.wm.pdf Islam, Md Kamrul (2019) An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning. Masters thesis, Universiti Malaysia Pahang (Contributors, Thesis advisor: Zamli, Kamal Zuhairi). |
institution |
Universiti Malaysia Pahang |
building |
UMP Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Pahang |
content_source |
UMP Institutional Repository |
url_provider |
http://umpir.ump.edu.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Islam, Md Kamrul An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning |
description |
Data stream clustering plays an important role in data stream mining for knowledge extraction. In recent years, numerous researchers have studied the online density-based clustering technique due to its capability to generate arbitrarily shaped clusters. The technique summarizes the data stream in micro-clusters and the micro-clusters form the clusters. However, most of the clusters are either not fully online, or cannot handle the properties of data stream properly. Moreover, the algorithms require predefining the global optimal radius of micro-clusters, which is a difficult task, and an erroneous choice deteriorates the cluster quality. In addition, the algorithms ignore the presence of temporarily irrelevant micro-clusters, which may be relevant in the future. This ignorance causes the degradation of clustering quality and the increase of the processing time as micro-clusters are deleted and created frequently due to evolving nature of data stream. In this study, a fully online density-based clustering algorithm called Buffer-based Online Clustering for Evolving Data Stream (BOCEDS) is presented. BOCEDS clusters the data stream in a single stage. The algorithm summarizes the data from data stream in micro-clusters. This algorithm maintains the local optimal radius of micro-clusters rather than a global and constant radius. Moreover, it introduces a buffer for storing irrelevant micro-clusters and a fully online pruning process for extracting the temporarily irrelevant micro-cluster from the buffer. The pruning process improves processing time. In addition, BOCEDS proposes an online micro-cluster energy updating function based on the spatial information of the data stream. Then, clustering graphs are generated based on the connectivity among micro-clusters. The clusters are generated from the clustering graphs. To evaluate the performance, BOCEDS algorithm is executed on two syntactic and one practical data streams. The experimental result shows BOCEDS is able to generate new clusters and remove outdated clusters with time as data stream contents change. The experiment on noisy data stream shows that BOCEDS algorithm can detect noise with an accuracy of approximately 100%. The overall clustering accuracy and purity are more than 99%. Experimental results are compared with other alternative online/offline hybrid density-based clustering algorithms. The average processing time for data point in the data stream is about 2 milliseconds which is much lower than the aligned clustering algorithms in literature. The algorithm is also more scalable to high dimensional data stream than the existing algorithms. The sensitivity of clustering parameters in BOCEDS is also measured. The result shows that in case of changing the values of parameters the cluster quality deviates by a very small amount (<1%). These results prove the superiority of BOCEDS algorithm over the existing clustering algorithms. The BOCEDS algorithm is then applied to real-world weather data streams to demonstrate its capability to detect the drifts in the data stream and discover arbitrarily shaped clusters. |
format |
Thesis |
author |
Islam, Md Kamrul |
author_facet |
Islam, Md Kamrul |
author_sort |
Islam, Md Kamrul |
title |
An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning |
title_short |
An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning |
title_full |
An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning |
title_fullStr |
An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning |
title_full_unstemmed |
An online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning |
title_sort |
online density-based clustering algorithm for data stream based on local optimal radius and cluster pruning |
publishDate |
2019 |
url |
http://umpir.ump.edu.my/id/eprint/31091/1/An%20online%20density-based%20clustering%20algorithm%20for%20data%20stream%20based%20on%20local%20optimal%20radius%20and%20cluster%20pruning.wm.pdf http://umpir.ump.edu.my/id/eprint/31091/ |
_version_ |
1761616638710382592 |