Detection of the spread of Covid-19 in Indonesia using K-Means Clustering Algorithm / Mohammad Yazdi Pusadan ... [et al.]
The purpose of this study is to apply the K-Means algorithm to perform clustering on COVID-19 data to determine the high spread of the virus in regions in Indonesia based on the frequency of the data. The data source used as training data comes from the official Kaggle website, the data used in this...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Book Section |
Language: | English |
Published: |
Faculty of Computer and Mathematical Sciences
2023
|
Subjects: | |
Online Access: | https://ir.uitm.edu.my/id/eprint/93953/1/93953.pdf https://ir.uitm.edu.my/id/eprint/93953/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Mara |
Language: | English |
Summary: | The purpose of this study is to apply the K-Means algorithm to perform clustering on COVID-19 data to determine the high spread of the virus in regions in Indonesia based on the frequency of the data. The data source used as training data comes from the official Kaggle website, the data used in this study is data on the spread of the coronavirus collected from 2020 to 2021 with a total of 20,816 training data. The clustering process to obtain regional data that has a high spread of COVID-19 is based on the number of cases, death rates, and cure rates in provinces in Indonesia. The process of determining the performance of the cluster is continued based on the internal validity test based on the silhouette index. In this study, the method used is K-Means to perform clustering based on area grouping. The implementation of the K-Means Clustering algorithm for detecting the level of spread of COVID-19 data in Indonesia by using the parameter k=3 is quite good with areas in Indonesia that have a high the spread of COVID-19 and the results of the cluster validity test get silhouette values on O = (Total Case, Total Death) and P = (Total Case, Total Death, Total Recovered) have the same cluster value, which is 0.93 which means the cluster quality is very good. |
---|