Data stream clustering by divide and conquer approach based on vector model

Recently, many researchers have focused on data stream processing as an efficient method for extracting knowledge from big data. Data stream clustering is an unsupervised approach that is employed for huge data. The continuous effort on data stream clustering method has one common goal which is to a...

Full description

Saved in:
Bibliographic Details
Main Authors: Khalilian, Madjid, Mustapha, Norwati, Sulaiman, Nasir
Format: Article
Language:English
Published: Springer 2016
Online Access:http://psasir.upm.edu.my/id/eprint/55419/1/Data%20stream%20clustering%20by%20divide.pdf
http://psasir.upm.edu.my/id/eprint/55419/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
id my.upm.eprints.55419
record_format eprints
spelling my.upm.eprints.554192017-10-02T07:53:29Z http://psasir.upm.edu.my/id/eprint/55419/ Data stream clustering by divide and conquer approach based on vector model Khalilian, Madjid Mustapha, Norwati Sulaiman, Nasir Recently, many researchers have focused on data stream processing as an efficient method for extracting knowledge from big data. Data stream clustering is an unsupervised approach that is employed for huge data. The continuous effort on data stream clustering method has one common goal which is to achieve an accurate clustering algorithm. However, there are some issues that are overlooked by the previous works in proposing data stream clustering solutions; (1) clustering dataset including big segments of repetitive data, (2) monitoring clustering structure for ordinal data streams and (3) determining important parameters such as k number of exact clusters in stream of data. In this paper, DCSTREAM method is proposed with regard to the mentioned issues to cluster big datasets using the vector model and k-Means divide and conquer approach. Experimental results show that DCSTREAM can achieve superior quality and performance as compare to STREAM and ConStream methods for abrupt and gradual real world datasets. Results show that the usage of batch processing in DCSTREAM and ConStream is time consuming compared to STREAM but it avoids further analysis for detecting outliers and novel micro-clusters. Springer 2016 Article PeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/55419/1/Data%20stream%20clustering%20by%20divide.pdf Khalilian, Madjid and Mustapha, Norwati and Sulaiman, Nasir (2016) Data stream clustering by divide and conquer approach based on vector model. Journal of Big Data, 3 (1). pp. 1-21. ISSN 2196-1115 10.1186/s40537-015-0036-x
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description Recently, many researchers have focused on data stream processing as an efficient method for extracting knowledge from big data. Data stream clustering is an unsupervised approach that is employed for huge data. The continuous effort on data stream clustering method has one common goal which is to achieve an accurate clustering algorithm. However, there are some issues that are overlooked by the previous works in proposing data stream clustering solutions; (1) clustering dataset including big segments of repetitive data, (2) monitoring clustering structure for ordinal data streams and (3) determining important parameters such as k number of exact clusters in stream of data. In this paper, DCSTREAM method is proposed with regard to the mentioned issues to cluster big datasets using the vector model and k-Means divide and conquer approach. Experimental results show that DCSTREAM can achieve superior quality and performance as compare to STREAM and ConStream methods for abrupt and gradual real world datasets. Results show that the usage of batch processing in DCSTREAM and ConStream is time consuming compared to STREAM but it avoids further analysis for detecting outliers and novel micro-clusters.
format Article
author Khalilian, Madjid
Mustapha, Norwati
Sulaiman, Nasir
spellingShingle Khalilian, Madjid
Mustapha, Norwati
Sulaiman, Nasir
Data stream clustering by divide and conquer approach based on vector model
author_facet Khalilian, Madjid
Mustapha, Norwati
Sulaiman, Nasir
author_sort Khalilian, Madjid
title Data stream clustering by divide and conquer approach based on vector model
title_short Data stream clustering by divide and conquer approach based on vector model
title_full Data stream clustering by divide and conquer approach based on vector model
title_fullStr Data stream clustering by divide and conquer approach based on vector model
title_full_unstemmed Data stream clustering by divide and conquer approach based on vector model
title_sort data stream clustering by divide and conquer approach based on vector model
publisher Springer
publishDate 2016
url http://psasir.upm.edu.my/id/eprint/55419/1/Data%20stream%20clustering%20by%20divide.pdf
http://psasir.upm.edu.my/id/eprint/55419/
_version_ 1643835889778950144