K-sums clustering: A stochastic optimization approach

In this paper, we revisit the decades-old clustering method k-means. The egg-chicken loop in traditional k-means has been replaced by a pure stochastic optimization procedure. The optimization is undertaken from the perspective of each individual sample. Different from existing incremental k-means,...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHAO Wan-Lei, LAN, Shi Ying, CHEN, Run-Qing, NGO, Chong-wah
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Driven function k-means Stochastic optimization Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/6806 https://ink.library.smu.edu.sg/context/sis_research/article/7809/viewcontent/K_sums.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7809
record_format	dspace
spelling	sg-smu-ink.sis_research-78092022-04-19T02:37:25Z K-sums clustering: A stochastic optimization approach ZHAO Wan-Lei, LAN, Shi Ying CHEN, Run-Qing NGO, Chong-wah In this paper, we revisit the decades-old clustering method k-means. The egg-chicken loop in traditional k-means has been replaced by a pure stochastic optimization procedure. The optimization is undertaken from the perspective of each individual sample. Different from existing incremental k-means, an individual sample is tentatively joined into a new cluster to evaluate its distance to the corresponding new centroid, in which the contribution from this sample is accounted. The sample is moved to this new cluster concretely only after we find the reallocation makes the sample closer to the new centroid than it is to the current one. Compared with traditional k-means and other variants, this new procedure allows the clustering to converge faster to a better local minimum. This fundamental modification over the k-means loop leads to the redefinition of a family of k-means variants, such as hierarchical k-means, and Sequential k-means. As an extension, a new target function that minimizes the summation of pairwise distances within clusters is presented. Under l2-norm, it could be solved under the same stochastic optimization procedure. The re-defined traditional k-means, hierarchical k-means, as well as Sequential kmeans all show considerable performance improvement over their traditional counterparts under different settings and on various types of datasets 2021-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6806 info:doi/10.1145/3459637.3482359 https://ink.library.smu.edu.sg/context/sis_research/article/7809/viewcontent/K_sums.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Driven function k-means Stochastic optimization Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Driven function k-means Stochastic optimization Databases and Information Systems
spellingShingle	Driven function k-means Stochastic optimization Databases and Information Systems ZHAO Wan-Lei, LAN, Shi Ying CHEN, Run-Qing NGO, Chong-wah K-sums clustering: A stochastic optimization approach
description	In this paper, we revisit the decades-old clustering method k-means. The egg-chicken loop in traditional k-means has been replaced by a pure stochastic optimization procedure. The optimization is undertaken from the perspective of each individual sample. Different from existing incremental k-means, an individual sample is tentatively joined into a new cluster to evaluate its distance to the corresponding new centroid, in which the contribution from this sample is accounted. The sample is moved to this new cluster concretely only after we find the reallocation makes the sample closer to the new centroid than it is to the current one. Compared with traditional k-means and other variants, this new procedure allows the clustering to converge faster to a better local minimum. This fundamental modification over the k-means loop leads to the redefinition of a family of k-means variants, such as hierarchical k-means, and Sequential k-means. As an extension, a new target function that minimizes the summation of pairwise distances within clusters is presented. Under l2-norm, it could be solved under the same stochastic optimization procedure. The re-defined traditional k-means, hierarchical k-means, as well as Sequential kmeans all show considerable performance improvement over their traditional counterparts under different settings and on various types of datasets
format	text
author	ZHAO Wan-Lei, LAN, Shi Ying CHEN, Run-Qing NGO, Chong-wah
author_facet	ZHAO Wan-Lei, LAN, Shi Ying CHEN, Run-Qing NGO, Chong-wah
author_sort	ZHAO Wan-Lei,
title	K-sums clustering: A stochastic optimization approach
title_short	K-sums clustering: A stochastic optimization approach
title_full	K-sums clustering: A stochastic optimization approach
title_fullStr	K-sums clustering: A stochastic optimization approach
title_full_unstemmed	K-sums clustering: A stochastic optimization approach
title_sort	k-sums clustering: a stochastic optimization approach
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/6806 https://ink.library.smu.edu.sg/context/sis_research/article/7809/viewcontent/K_sums.pdf
_version_	1770576072460468224

K-sums clustering: A stochastic optimization approach

Similar Items