A quality metric for K-Means clustering based on centroid locations
K-Means clustering algorithm does not offer a clear methodology to determine the appropriate number of clusters; it does not have a built-in mechanism for K selection. In this paper, we present a new metric for clustering quality and describe its use for K selection. The proposed metric, based on th...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7744 https://ink.library.smu.edu.sg/context/sis_research/article/8747/viewcontent/A_quality_metric_for_k_means_clustering_based_on_centroid_locations.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-8747 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-87472023-08-21T08:53:54Z A quality metric for K-Means clustering based on centroid locations THULASIDAS, Manoj K-Means clustering algorithm does not offer a clear methodology to determine the appropriate number of clusters; it does not have a built-in mechanism for K selection. In this paper, we present a new metric for clustering quality and describe its use for K selection. The proposed metric, based on the locations of the centroids, as well as the desired properties of the clusters, is developed in two stages. In the initial stage, we take into account the full covariance matrix of the clustering variables, thereby making it mathematically similar to a reduced chi2. We then extend it to account for how well the clustering results comply with the underlying assumptions of the K-Means algorithm (namely, balanced clusters in terms of variance and membership), and define our final metric (MC ). We demonstrate, using synthetic and real data sets, how well our metric performs in determining the right number of clusters to form. We also present detailed comparisons with existing quality indexes for automatic determination of the number of clusters. 2022-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7744 info:doi/10.1007/978-3-031-22137-8_16 https://ink.library.smu.edu.sg/context/sis_research/article/8747/viewcontent/A_quality_metric_for_k_means_clustering_based_on_centroid_locations.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University K-Means clustering Quality metrics K selection problem Number of clusters Computer Engineering Numerical Analysis and Scientific Computing Theory and Algorithms |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
K-Means clustering Quality metrics K selection problem Number of clusters Computer Engineering Numerical Analysis and Scientific Computing Theory and Algorithms |
spellingShingle |
K-Means clustering Quality metrics K selection problem Number of clusters Computer Engineering Numerical Analysis and Scientific Computing Theory and Algorithms THULASIDAS, Manoj A quality metric for K-Means clustering based on centroid locations |
description |
K-Means clustering algorithm does not offer a clear methodology to determine the appropriate number of clusters; it does not have a built-in mechanism for K selection. In this paper, we present a new metric for clustering quality and describe its use for K selection. The proposed metric, based on the locations of the centroids, as well as the desired properties of the clusters, is developed in two stages. In the initial stage, we take into account the full covariance matrix of the clustering variables, thereby making it mathematically similar to a reduced chi2. We then extend it to account for how well the clustering results comply with the underlying assumptions of the K-Means algorithm (namely, balanced clusters in terms of variance and membership), and define our final metric (MC ). We demonstrate, using synthetic and real data sets, how well our metric performs in determining the right number of clusters to form. We also present detailed comparisons with existing quality indexes for automatic determination of the number of clusters. |
format |
text |
author |
THULASIDAS, Manoj |
author_facet |
THULASIDAS, Manoj |
author_sort |
THULASIDAS, Manoj |
title |
A quality metric for K-Means clustering based on centroid locations |
title_short |
A quality metric for K-Means clustering based on centroid locations |
title_full |
A quality metric for K-Means clustering based on centroid locations |
title_fullStr |
A quality metric for K-Means clustering based on centroid locations |
title_full_unstemmed |
A quality metric for K-Means clustering based on centroid locations |
title_sort |
quality metric for k-means clustering based on centroid locations |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2022 |
url |
https://ink.library.smu.edu.sg/sis_research/7744 https://ink.library.smu.edu.sg/context/sis_research/article/8747/viewcontent/A_quality_metric_for_k_means_clustering_based_on_centroid_locations.pdf |
_version_ |
1779156955446640640 |