Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering

Data are getting larger, and most of them are necessary for our businesses. Rapid explosion of data brings us a number of challenges relating to its complexity and how the most important knowledge can be captured in reasonable time. Fuzzy C-means (FCM)—one of the most efficient clustering algorithm...

Full description

Saved in:
Bibliographic Details
Main Authors: Le, Hoang Son, Nguyen, Dang Tien
Format: Article
Language:English
Published: Taiwan Fuzzy Systems Association and Springer-Verlag Berlin Heidelberg 2019
Subjects:
Online Access:http://repository.vnu.edu.vn/handle/VNU_123/64979
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Vietnam National University, Hanoi
Language: English
id oai:112.137.131.14:VNU_123-64979
record_format dspace
spelling oai:112.137.131.14:VNU_123-649792019-07-15T04:01:37Z Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering Le, Hoang Son Nguyen, Dang Tien Big Data Hybrid Clustering Algorithms Initial Selection Incremental Clustering Data are getting larger, and most of them are necessary for our businesses. Rapid explosion of data brings us a number of challenges relating to its complexity and how the most important knowledge can be captured in reasonable time. Fuzzy C-means (FCM)—one of the most efficient clustering algorithms which have been widely used in pattern recognition, data compression, image segmentation, computer vision and many other fields—also faces the problem of processing large datasets. In this paper, we propose some novel hybrid clustering algorithms based on incremental clustering and initial selection to tune up FCM for the Big Data problem. The first algorithm determines meshes of rectangle covering data points as the representatives, while the second one considers data points that have high influence to others as the representatives. The representatives are then clustered by FCM, and the new centers are selected as initial ones for clustering of the dataset. Theoretical analyses of the new algorithms including comparison of quality of solutions when clustering the representatives set versus the entire set are examined. The experimental results on both simulated and real datasets show that total computational time of the new methods including time of finding representatives and clustering is faster than those of other relevant algorithms. The validation on clustering quality is also examined. The findings of this paper have great impact and significance to researches in the fields of soft computing and Big Data processing. It is obvious that computing methodologies nowadays are facing with huge amount of diverse and complex data structures. Speed of processing is the main priority when considering effectiveness of a specific method. The findings demonstrated practical algorithms and investigated their characteristics that could be referenced by other researchers in similar applications. The usefulness and significance of this research are clearly demonstrated within the extent of real-life applications. 2019-07-15T04:01:37Z 2019-07-15T04:01:37Z 2017 Article Le, H. S., & Nguyen, D. T. (2017). Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering. International Journal of Fuzzy Systems, 19(5), 1585-1602. http://repository.vnu.edu.vn/handle/VNU_123/64979 10.1007/s40815-016-0260-3 en International Journal of Fuzzy Systems; application/pdf Taiwan Fuzzy Systems Association and Springer-Verlag Berlin Heidelberg
institution Vietnam National University, Hanoi
building VNU Library & Information Center
country Vietnam
collection VNU Digital Repository
language English
topic Big Data
Hybrid Clustering Algorithms
Initial Selection
Incremental Clustering
spellingShingle Big Data
Hybrid Clustering Algorithms
Initial Selection
Incremental Clustering
Le, Hoang Son
Nguyen, Dang Tien
Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
description Data are getting larger, and most of them are necessary for our businesses. Rapid explosion of data brings us a number of challenges relating to its complexity and how the most important knowledge can be captured in reasonable time. Fuzzy C-means (FCM)—one of the most efficient clustering algorithms which have been widely used in pattern recognition, data compression, image segmentation, computer vision and many other fields—also faces the problem of processing large datasets. In this paper, we propose some novel hybrid clustering algorithms based on incremental clustering and initial selection to tune up FCM for the Big Data problem. The first algorithm determines meshes of rectangle covering data points as the representatives, while the second one considers data points that have high influence to others as the representatives. The representatives are then clustered by FCM, and the new centers are selected as initial ones for clustering of the dataset. Theoretical analyses of the new algorithms including comparison of quality of solutions when clustering the representatives set versus the entire set are examined. The experimental results on both simulated and real datasets show that total computational time of the new methods including time of finding representatives and clustering is faster than those of other relevant algorithms. The validation on clustering quality is also examined. The findings of this paper have great impact and significance to researches in the fields of soft computing and Big Data processing. It is obvious that computing methodologies nowadays are facing with huge amount of diverse and complex data structures. Speed of processing is the main priority when considering effectiveness of a specific method. The findings demonstrated practical algorithms and investigated their characteristics that could be referenced by other researchers in similar applications. The usefulness and significance of this research are clearly demonstrated within the extent of real-life applications.
format Article
author Le, Hoang Son
Nguyen, Dang Tien
author_facet Le, Hoang Son
Nguyen, Dang Tien
author_sort Le, Hoang Son
title Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
title_short Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
title_full Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
title_fullStr Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
title_full_unstemmed Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
title_sort tune up fuzzy c-means for big data: some novel hybrid clustering algorithms based on initial selection and incremental clustering
publisher Taiwan Fuzzy Systems Association and Springer-Verlag Berlin Heidelberg
publishDate 2019
url http://repository.vnu.edu.vn/handle/VNU_123/64979
_version_ 1680966889198583808