Nearest Centroid: A bridge between statistics and machine learning

In order to guide our students of machine learning in their statistical thinking, we need conceptually simple and mathematically defensible algorithms. In this paper, we present the Nearest Centroid algorithm (NC) algorithm as a pedagogical tool, combining the key concepts behind two foundational al...

Full description

Saved in:
Bibliographic Details
Main Author: THULASIDAS, Manoj
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2020
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/5555
https://ink.library.smu.edu.sg/context/sis_research/article/6558/viewcontent/Nearest_Centroid_av.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-6558
record_format dspace
spelling sg-smu-ink.sis_research-65582021-05-07T06:06:55Z Nearest Centroid: A bridge between statistics and machine learning THULASIDAS, Manoj In order to guide our students of machine learning in their statistical thinking, we need conceptually simple and mathematically defensible algorithms. In this paper, we present the Nearest Centroid algorithm (NC) algorithm as a pedagogical tool, combining the key concepts behind two foundational algorithms: K-Means clustering and K Nearest Neighbors (k- NN). In NC, we use the centroid (as defined in the K-Means algorithm) of the observations belonging to each class in our training data set and its distance from a new observation (similar to k-NN) for class prediction. Using this obvious extension, we will illustrate how the concepts of probability and statistics are applied in machine learning algorithms. Furthermore, we will describe how the practical aspects of validation and performance measurements are carried out. The algorithm and the work presented here can be easily converted to labs and reading assignments to cement the students' understanding of applied statistics and its connection to machine learning algorithms, as described toward the end of this paper. 2020-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5555 info:doi/10.1109/TALE48869.2020.9368396 https://ink.library.smu.edu.sg/context/sis_research/article/6558/viewcontent/Nearest_Centroid_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University statistical thinking applied statistics machine learning nearest centroid k-means clustering k nearest neighbor Artificial Intelligence and Robotics Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic statistical thinking
applied statistics
machine learning
nearest centroid
k-means clustering
k nearest neighbor
Artificial Intelligence and Robotics
Databases and Information Systems
spellingShingle statistical thinking
applied statistics
machine learning
nearest centroid
k-means clustering
k nearest neighbor
Artificial Intelligence and Robotics
Databases and Information Systems
THULASIDAS, Manoj
Nearest Centroid: A bridge between statistics and machine learning
description In order to guide our students of machine learning in their statistical thinking, we need conceptually simple and mathematically defensible algorithms. In this paper, we present the Nearest Centroid algorithm (NC) algorithm as a pedagogical tool, combining the key concepts behind two foundational algorithms: K-Means clustering and K Nearest Neighbors (k- NN). In NC, we use the centroid (as defined in the K-Means algorithm) of the observations belonging to each class in our training data set and its distance from a new observation (similar to k-NN) for class prediction. Using this obvious extension, we will illustrate how the concepts of probability and statistics are applied in machine learning algorithms. Furthermore, we will describe how the practical aspects of validation and performance measurements are carried out. The algorithm and the work presented here can be easily converted to labs and reading assignments to cement the students' understanding of applied statistics and its connection to machine learning algorithms, as described toward the end of this paper.
format text
author THULASIDAS, Manoj
author_facet THULASIDAS, Manoj
author_sort THULASIDAS, Manoj
title Nearest Centroid: A bridge between statistics and machine learning
title_short Nearest Centroid: A bridge between statistics and machine learning
title_full Nearest Centroid: A bridge between statistics and machine learning
title_fullStr Nearest Centroid: A bridge between statistics and machine learning
title_full_unstemmed Nearest Centroid: A bridge between statistics and machine learning
title_sort nearest centroid: a bridge between statistics and machine learning
publisher Institutional Knowledge at Singapore Management University
publishDate 2020
url https://ink.library.smu.edu.sg/sis_research/5555
https://ink.library.smu.edu.sg/context/sis_research/article/6558/viewcontent/Nearest_Centroid_av.pdf
_version_ 1770575508133642240