Analysis on the impact of feature reduction on time-series data

Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have expl...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Ernest Yong En
Other Authors: A S Madhukumar
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137889
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have explored automated feature engineering techniques to automate this process. This project analyzes the most recent advances in automated feature engineering for time-series data classification, focusing specifically on two techniques: catch22 and tsfresh. The project implements both pipelines in Python on a local machine and analyzes their performance in terms of class-balanced accuracy, macro-weighted F-score and computation time. Statistical methods like the 5x2 Cross Validation (CV) test are also used to determine if the differences were statistically significant. The results of this project suggest that catch22 could potentially perform better than tsfresh on macro-weighted F-scores in general. Additionally, both pipelines seem to perform well in terms of class-balanced accuracy on certain types and categories of time-series data. Unfortunately, these performance differences are not fully supported by the results of the 5x2cv test. More experiments with a wider variety of datasets must be conducted before a definitive conclusion can be made. Also, catch22 seems to take at least multiple times longer than tsfresh in extracting and selecting features. However, given that executing the feature engineering pipeline is a one-off process, it is still likely to be worthwhile to use catch22 if it provides a more comprehensive feature set, especially if F-score is the metric to be used.