Time series clustering and anomaly detection of COVID-19 global cases and deaths
The spread and fatality patterns of COVID-19 behaviour varies amongst all the countries around the globe due to a multitude of reasons such as governments imposing differing strictness levels of health safety and quarantine measures at different times, geographical factors such as the population...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/156387 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The spread and fatality patterns of COVID-19 behaviour varies amongst all the countries
around the globe due to a multitude of reasons such as governments imposing differing
strictness levels of health safety and quarantine measures at different times, geographical
factors such as the population density, people’s attitude towards the virus, and the emergence
of different variants of the virus with varying transmissibility and mortality.
Time series clustering and anomaly detection are important analyses that identifies patterns in
data, provides insightful knowledge on the similarity and dissimilarity of COVID-19
behaviours of different countries and identifies countries with anomalous spread patterns. In
this contribution, two conventional time series clustering methods, K-Means++ and
Agglomerative Hierarchical Clustering were performed on the global COVID-19 confirmed
cases and deaths for all 194 countries from 22 January 2020 to 22 February 2022, for the longest
possible period of 2 years and 1 months. This contribution is arguably the first to perform
clustering for the longest possible time period for COVID-19 data using these clustering
algorithms, and for all 194 countries available in the dataset.
For the K-Means++ clustering algorithm, two different distance metrics were utilized, namely
Euclidean Distance and Dynamic Time Warping, and the performance of the two algorithms
and clustering results were compared and analysed.
Finally, after the identification of anomalous clusters and evaluation of the clustering results,
point anomaly detection were used to identify anomalous data points in the time series for the
selected countries, Singapore, Faroe Islands, Peru, and Grenada. Three machine learning
algorithms namely Isolation Forest, Clustering Based Local Outlier Factor, and One Class
Support Vector Machine were performed to detect anomalous points in each individual time
series. The performance and results from the three algorithms were analysed and the common
point anomalies detected by all three algorithms are extracted for a more holistic evaluation. |
---|