Incremental fuzzy clustering with multiple medoids for large data
As an important technique of data analysis, clustering plays an important role in finding the underlying pattern structure embedded in the unlabelled data. Clustering algorithms that need to store the entire data into the memory for analysis become infeasible when the data set is too large to be s...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/106736 http://hdl.handle.net/10220/25085 http://dx.doi.org/10.1109/TFUZZ.2014.2298244 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | As an important technique of data analysis, clustering
plays an important role in finding the underlying pattern
structure embedded in the unlabelled data. Clustering algorithms that need to store the entire data into the memory for analysis become infeasible when the data set is too large to be stored.
To handle such kind of large data, incremental clustering
approaches are proposed. The key idea of these approaches
is to find representatives (centroids or medoids) to represent
each cluster in each data chunk, which is a packet of the data,
and final data analysis is carried out based on those identified
representatives from all the chunks. In this paper we propose a
new incremental clustering approach called incremental multiple medoids based fuzzy clustering(IMMFC) to handle complex patterns that are not compact and well separated. We would like to investigate if IMMFC is a good alternative to capture the underlying data structure more accurately. IMMFC not only facilitates the selection of multiple medoids for each cluster in a data chunk, but also has the mechanism to make use of relationships among those identified medoids as side information to help the final data clustering process. The detailed problem formulation, updating rules derivation, and the in-depth analysis of the proposed IMMFC are provided. Experimental studies on several large data sets including real world malware data sets have been conducted. IMMFC outperforms existing incremental fuzzy clustering approaches in terms of clustering accuracy and robustness to the order of data. These results demonstrate the great potential of IMMFC for large data analysis. |
---|