Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering

Data are getting larger, and most of them are necessary for our businesses. Rapid explosion of data brings us a number of challenges relating to its complexity and how the most important knowledge can be captured in reasonable time. Fuzzy C-means (FCM)—one of the most efficient clustering algorithm...

Full description

Saved in:

Bibliographic Details
Main Authors:	Le, Hoang Son, Nguyen, Dang Tien
Format:	Article
Language:	English
Published:	Taiwan Fuzzy Systems Association and Springer-Verlag Berlin Heidelberg 2019
Subjects:	Big Data Hybrid Clustering Algorithms Initial Selection Incremental Clustering
Online Access:	http://repository.vnu.edu.vn/handle/VNU_123/64979
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Vietnam National University, Hanoi
Language:	English

id	oai:112.137.131.14:VNU_123-64979
record_format	dspace
spelling	oai:112.137.131.14:VNU_123-649792019-07-15T04:01:37Z Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering Le, Hoang Son Nguyen, Dang Tien Big Data Hybrid Clustering Algorithms Initial Selection Incremental Clustering Data are getting larger, and most of them are necessary for our businesses. Rapid explosion of data brings us a number of challenges relating to its complexity and how the most important knowledge can be captured in reasonable time. Fuzzy C-means (FCM)—one of the most efficient clustering algorithms which have been widely used in pattern recognition, data compression, image segmentation, computer vision and many other fields—also faces the problem of processing large datasets. In this paper, we propose some novel hybrid clustering algorithms based on incremental clustering and initial selection to tune up FCM for the Big Data problem. The first algorithm determines meshes of rectangle covering data points as the representatives, while the second one considers data points that have high influence to others as the representatives. The representatives are then clustered by FCM, and the new centers are selected as initial ones for clustering of the dataset. Theoretical analyses of the new algorithms including comparison of quality of solutions when clustering the representatives set versus the entire set are examined. The experimental results on both simulated and real datasets show that total computational time of the new methods including time of finding representatives and clustering is faster than those of other relevant algorithms. The validation on clustering quality is also examined. The findings of this paper have great impact and significance to researches in the fields of soft computing and Big Data processing. It is obvious that computing methodologies nowadays are facing with huge amount of diverse and complex data structures. Speed of processing is the main priority when considering effectiveness of a specific method. The findings demonstrated practical algorithms and investigated their characteristics that could be referenced by other researchers in similar applications. The usefulness and significance of this research are clearly demonstrated within the extent of real-life applications. 2019-07-15T04:01:37Z 2019-07-15T04:01:37Z 2017 Article Le, H. S., & Nguyen, D. T. (2017). Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering. International Journal of Fuzzy Systems, 19(5), 1585-1602. http://repository.vnu.edu.vn/handle/VNU_123/64979 10.1007/s40815-016-0260-3 en International Journal of Fuzzy Systems; application/pdf Taiwan Fuzzy Systems Association and Springer-Verlag Berlin Heidelberg
institution	Vietnam National University, Hanoi
building	VNU Library & Information Center
country	Vietnam
collection	VNU Digital Repository
language	English
topic	Big Data Hybrid Clustering Algorithms Initial Selection Incremental Clustering
spellingShingle	Big Data Hybrid Clustering Algorithms Initial Selection Incremental Clustering Le, Hoang Son Nguyen, Dang Tien Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
description	Data are getting larger, and most of them are necessary for our businesses. Rapid explosion of data brings us a number of challenges relating to its complexity and how the most important knowledge can be captured in reasonable time. Fuzzy C-means (FCM)—one of the most efficient clustering algorithms which have been widely used in pattern recognition, data compression, image segmentation, computer vision and many other fields—also faces the problem of processing large datasets. In this paper, we propose some novel hybrid clustering algorithms based on incremental clustering and initial selection to tune up FCM for the Big Data problem. The first algorithm determines meshes of rectangle covering data points as the representatives, while the second one considers data points that have high influence to others as the representatives. The representatives are then clustered by FCM, and the new centers are selected as initial ones for clustering of the dataset. Theoretical analyses of the new algorithms including comparison of quality of solutions when clustering the representatives set versus the entire set are examined. The experimental results on both simulated and real datasets show that total computational time of the new methods including time of finding representatives and clustering is faster than those of other relevant algorithms. The validation on clustering quality is also examined. The findings of this paper have great impact and significance to researches in the fields of soft computing and Big Data processing. It is obvious that computing methodologies nowadays are facing with huge amount of diverse and complex data structures. Speed of processing is the main priority when considering effectiveness of a specific method. The findings demonstrated practical algorithms and investigated their characteristics that could be referenced by other researchers in similar applications. The usefulness and significance of this research are clearly demonstrated within the extent of real-life applications.
format	Article
author	Le, Hoang Son Nguyen, Dang Tien
author_facet	Le, Hoang Son Nguyen, Dang Tien
author_sort	Le, Hoang Son
title	Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
title_short	Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
title_full	Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
title_fullStr	Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
title_full_unstemmed	Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering
title_sort	tune up fuzzy c-means for big data: some novel hybrid clustering algorithms based on initial selection and incremental clustering
publisher	Taiwan Fuzzy Systems Association and Springer-Verlag Berlin Heidelberg
publishDate	2019
url	http://repository.vnu.edu.vn/handle/VNU_123/64979
_version_	1680966889198583808

Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering

Similar Items