A recommendation on how to teach K-means in introductory analytics courses

We teach K-Means clustering in introductory data analytics courses because it is one of the simplest and most widely used unsupervised machine learning algorithms. However, one drawback of this algorithm is that it does not offer a clear method to determine the appropriate number of clusters; it doe...

Full description

Saved in:

Bibliographic Details
Main Author:	THULASIDAS, Manoj
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	K-Means Clustering Quality Metrics K Selection Variance Ratio Criterion Higher Education Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/7679 https://ink.library.smu.edu.sg/context/sis_research/article/8682/viewcontent/2022194794.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8682
record_format	dspace
spelling	sg-smu-ink.sis_research-86822024-11-20T08:06:11Z A recommendation on how to teach K-means in introductory analytics courses THULASIDAS, Manoj We teach K-Means clustering in introductory data analytics courses because it is one of the simplest and most widely used unsupervised machine learning algorithms. However, one drawback of this algorithm is that it does not offer a clear method to determine the appropriate number of clusters; it does not have a built-in mechanism for K selection. What is usually taught as the solution for the K Selection problem is the so-called elbow method, where we look at the incremental changes in some quality metric (usually, the sum of squared errors, SSE), trying to find a sudden change. In addition to SSE, we can find many other metrics and methods in the literature. In this paper, we survey several of them, and conclude that the Variance Ratio Criterion (VRC) is an appropriate metric we should consider teaching for K Selection. From a pedagogical perspective, VRC has desirable mathematical properties, which help emphasize the statistical underpinnings of the algorithm, thereby reinforcing the students’ understanding through experiential learning. We also list the key concepts targeted by the VRC approach and provide ideas for assignments. 2022-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7679 info:doi/10.1109/TALE54877.2022.00016 https://ink.library.smu.edu.sg/context/sis_research/article/8682/viewcontent/2022194794.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University K-Means Clustering Quality Metrics K Selection Variance Ratio Criterion Higher Education Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	K-Means Clustering Quality Metrics K Selection Variance Ratio Criterion Higher Education Numerical Analysis and Scientific Computing
spellingShingle	K-Means Clustering Quality Metrics K Selection Variance Ratio Criterion Higher Education Numerical Analysis and Scientific Computing THULASIDAS, Manoj A recommendation on how to teach K-means in introductory analytics courses
description	We teach K-Means clustering in introductory data analytics courses because it is one of the simplest and most widely used unsupervised machine learning algorithms. However, one drawback of this algorithm is that it does not offer a clear method to determine the appropriate number of clusters; it does not have a built-in mechanism for K selection. What is usually taught as the solution for the K Selection problem is the so-called elbow method, where we look at the incremental changes in some quality metric (usually, the sum of squared errors, SSE), trying to find a sudden change. In addition to SSE, we can find many other metrics and methods in the literature. In this paper, we survey several of them, and conclude that the Variance Ratio Criterion (VRC) is an appropriate metric we should consider teaching for K Selection. From a pedagogical perspective, VRC has desirable mathematical properties, which help emphasize the statistical underpinnings of the algorithm, thereby reinforcing the students’ understanding through experiential learning. We also list the key concepts targeted by the VRC approach and provide ideas for assignments.
format	text
author	THULASIDAS, Manoj
author_facet	THULASIDAS, Manoj
author_sort	THULASIDAS, Manoj
title	A recommendation on how to teach K-means in introductory analytics courses
title_short	A recommendation on how to teach K-means in introductory analytics courses
title_full	A recommendation on how to teach K-means in introductory analytics courses
title_fullStr	A recommendation on how to teach K-means in introductory analytics courses
title_full_unstemmed	A recommendation on how to teach K-means in introductory analytics courses
title_sort	recommendation on how to teach k-means in introductory analytics courses
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/7679 https://ink.library.smu.edu.sg/context/sis_research/article/8682/viewcontent/2022194794.pdf
_version_	1816859164199092224

A recommendation on how to teach K-means in introductory analytics courses

Similar Items