Time series clustering and anomaly detection of COVID-19 global cases and deaths

The spread and fatality patterns of COVID-19 behaviour varies amongst all the countries around the globe due to a multitude of reasons such as governments imposing differing strictness levels of health safety and quarantine measures at different times, geographical factors such as the population...

Full description

Saved in:
Bibliographic Details
Main Author: Liew, Zhi Li
Other Authors: Ke Yiping, Kelly
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156387
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-156387
record_format dspace
spelling sg-ntu-dr.10356-1563872022-04-16T09:18:47Z Time series clustering and anomaly detection of COVID-19 global cases and deaths Liew, Zhi Li Ke Yiping, Kelly School of Computer Science and Engineering ypke@ntu.edu.sg Engineering::Computer science and engineering The spread and fatality patterns of COVID-19 behaviour varies amongst all the countries around the globe due to a multitude of reasons such as governments imposing differing strictness levels of health safety and quarantine measures at different times, geographical factors such as the population density, people’s attitude towards the virus, and the emergence of different variants of the virus with varying transmissibility and mortality. Time series clustering and anomaly detection are important analyses that identifies patterns in data, provides insightful knowledge on the similarity and dissimilarity of COVID-19 behaviours of different countries and identifies countries with anomalous spread patterns. In this contribution, two conventional time series clustering methods, K-Means++ and Agglomerative Hierarchical Clustering were performed on the global COVID-19 confirmed cases and deaths for all 194 countries from 22 January 2020 to 22 February 2022, for the longest possible period of 2 years and 1 months. This contribution is arguably the first to perform clustering for the longest possible time period for COVID-19 data using these clustering algorithms, and for all 194 countries available in the dataset. For the K-Means++ clustering algorithm, two different distance metrics were utilized, namely Euclidean Distance and Dynamic Time Warping, and the performance of the two algorithms and clustering results were compared and analysed. Finally, after the identification of anomalous clusters and evaluation of the clustering results, point anomaly detection were used to identify anomalous data points in the time series for the selected countries, Singapore, Faroe Islands, Peru, and Grenada. Three machine learning algorithms namely Isolation Forest, Clustering Based Local Outlier Factor, and One Class Support Vector Machine were performed to detect anomalous points in each individual time series. The performance and results from the three algorithms were analysed and the common point anomalies detected by all three algorithms are extracted for a more holistic evaluation. Bachelor of Engineering (Computer Science) 2022-04-16T09:18:46Z 2022-04-16T09:18:46Z 2022 Final Year Project (FYP) Liew, Z. L. (2022). Time series clustering and anomaly detection of COVID-19 global cases and deaths. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156387 https://hdl.handle.net/10356/156387 en SCSE21-0368 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Liew, Zhi Li
Time series clustering and anomaly detection of COVID-19 global cases and deaths
description The spread and fatality patterns of COVID-19 behaviour varies amongst all the countries around the globe due to a multitude of reasons such as governments imposing differing strictness levels of health safety and quarantine measures at different times, geographical factors such as the population density, people’s attitude towards the virus, and the emergence of different variants of the virus with varying transmissibility and mortality. Time series clustering and anomaly detection are important analyses that identifies patterns in data, provides insightful knowledge on the similarity and dissimilarity of COVID-19 behaviours of different countries and identifies countries with anomalous spread patterns. In this contribution, two conventional time series clustering methods, K-Means++ and Agglomerative Hierarchical Clustering were performed on the global COVID-19 confirmed cases and deaths for all 194 countries from 22 January 2020 to 22 February 2022, for the longest possible period of 2 years and 1 months. This contribution is arguably the first to perform clustering for the longest possible time period for COVID-19 data using these clustering algorithms, and for all 194 countries available in the dataset. For the K-Means++ clustering algorithm, two different distance metrics were utilized, namely Euclidean Distance and Dynamic Time Warping, and the performance of the two algorithms and clustering results were compared and analysed. Finally, after the identification of anomalous clusters and evaluation of the clustering results, point anomaly detection were used to identify anomalous data points in the time series for the selected countries, Singapore, Faroe Islands, Peru, and Grenada. Three machine learning algorithms namely Isolation Forest, Clustering Based Local Outlier Factor, and One Class Support Vector Machine were performed to detect anomalous points in each individual time series. The performance and results from the three algorithms were analysed and the common point anomalies detected by all three algorithms are extracted for a more holistic evaluation.
author2 Ke Yiping, Kelly
author_facet Ke Yiping, Kelly
Liew, Zhi Li
format Final Year Project
author Liew, Zhi Li
author_sort Liew, Zhi Li
title Time series clustering and anomaly detection of COVID-19 global cases and deaths
title_short Time series clustering and anomaly detection of COVID-19 global cases and deaths
title_full Time series clustering and anomaly detection of COVID-19 global cases and deaths
title_fullStr Time series clustering and anomaly detection of COVID-19 global cases and deaths
title_full_unstemmed Time series clustering and anomaly detection of COVID-19 global cases and deaths
title_sort time series clustering and anomaly detection of covid-19 global cases and deaths
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/156387
_version_ 1731235734007840768