Incremental fuzzy clustering with multiple medoids for large data

As an important technique of data analysis, clustering plays an important role in finding the underlying pattern structure embedded in the unlabelled data. Clustering algorithms that need to store the entire data into the memory for analysis become infeasible when the data set is too large to be s...

Full description

Saved in:
Bibliographic Details
Main Authors: Wang, Yangtao, Chen, Lihui, Mei, Jian-Ping
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2015
Subjects:
Online Access:https://hdl.handle.net/10356/106736
http://hdl.handle.net/10220/25085
http://dx.doi.org/10.1109/TFUZZ.2014.2298244
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-106736
record_format dspace
spelling sg-ntu-dr.10356-1067362019-12-06T22:17:16Z Incremental fuzzy clustering with multiple medoids for large data Wang, Yangtao Chen, Lihui Mei, Jian-Ping School of Electrical and Electronic Engineering DRNTU::Engineering::Computer science and engineering::Data As an important technique of data analysis, clustering plays an important role in finding the underlying pattern structure embedded in the unlabelled data. Clustering algorithms that need to store the entire data into the memory for analysis become infeasible when the data set is too large to be stored. To handle such kind of large data, incremental clustering approaches are proposed. The key idea of these approaches is to find representatives (centroids or medoids) to represent each cluster in each data chunk, which is a packet of the data, and final data analysis is carried out based on those identified representatives from all the chunks. In this paper we propose a new incremental clustering approach called incremental multiple medoids based fuzzy clustering(IMMFC) to handle complex patterns that are not compact and well separated. We would like to investigate if IMMFC is a good alternative to capture the underlying data structure more accurately. IMMFC not only facilitates the selection of multiple medoids for each cluster in a data chunk, but also has the mechanism to make use of relationships among those identified medoids as side information to help the final data clustering process. The detailed problem formulation, updating rules derivation, and the in-depth analysis of the proposed IMMFC are provided. Experimental studies on several large data sets including real world malware data sets have been conducted. IMMFC outperforms existing incremental fuzzy clustering approaches in terms of clustering accuracy and robustness to the order of data. These results demonstrate the great potential of IMMFC for large data analysis. Accepted version 2015-02-24T08:18:26Z 2019-12-06T22:17:16Z 2015-02-24T08:18:26Z 2019-12-06T22:17:16Z 2014 2014 Journal Article Wang, Y., Chen, L., & Mei, J.-P. (2014). Incremental fuzzy clustering with multiple medoids for large data. IEEE transactions on fuzzy systems, 22(6), 1557-1568. 1063-6706 https://hdl.handle.net/10356/106736 http://hdl.handle.net/10220/25085 http://dx.doi.org/10.1109/TFUZZ.2014.2298244 en IEEE transactions on fuzzy systems © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/TFUZZ.2014.2298244]. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Data
spellingShingle DRNTU::Engineering::Computer science and engineering::Data
Wang, Yangtao
Chen, Lihui
Mei, Jian-Ping
Incremental fuzzy clustering with multiple medoids for large data
description As an important technique of data analysis, clustering plays an important role in finding the underlying pattern structure embedded in the unlabelled data. Clustering algorithms that need to store the entire data into the memory for analysis become infeasible when the data set is too large to be stored. To handle such kind of large data, incremental clustering approaches are proposed. The key idea of these approaches is to find representatives (centroids or medoids) to represent each cluster in each data chunk, which is a packet of the data, and final data analysis is carried out based on those identified representatives from all the chunks. In this paper we propose a new incremental clustering approach called incremental multiple medoids based fuzzy clustering(IMMFC) to handle complex patterns that are not compact and well separated. We would like to investigate if IMMFC is a good alternative to capture the underlying data structure more accurately. IMMFC not only facilitates the selection of multiple medoids for each cluster in a data chunk, but also has the mechanism to make use of relationships among those identified medoids as side information to help the final data clustering process. The detailed problem formulation, updating rules derivation, and the in-depth analysis of the proposed IMMFC are provided. Experimental studies on several large data sets including real world malware data sets have been conducted. IMMFC outperforms existing incremental fuzzy clustering approaches in terms of clustering accuracy and robustness to the order of data. These results demonstrate the great potential of IMMFC for large data analysis.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Wang, Yangtao
Chen, Lihui
Mei, Jian-Ping
format Article
author Wang, Yangtao
Chen, Lihui
Mei, Jian-Ping
author_sort Wang, Yangtao
title Incremental fuzzy clustering with multiple medoids for large data
title_short Incremental fuzzy clustering with multiple medoids for large data
title_full Incremental fuzzy clustering with multiple medoids for large data
title_fullStr Incremental fuzzy clustering with multiple medoids for large data
title_full_unstemmed Incremental fuzzy clustering with multiple medoids for large data
title_sort incremental fuzzy clustering with multiple medoids for large data
publishDate 2015
url https://hdl.handle.net/10356/106736
http://hdl.handle.net/10220/25085
http://dx.doi.org/10.1109/TFUZZ.2014.2298244
_version_ 1681038085927731200