Analysis on the impact of feature reduction on time-series data
Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have expl...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/137889 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have explored automated feature engineering techniques to automate this process.
This project analyzes the most recent advances in automated feature engineering for time-series data classification, focusing specifically on two techniques: catch22 and tsfresh. The project implements both pipelines in Python on a local machine and analyzes their performance in terms of class-balanced accuracy, macro-weighted F-score and computation time. Statistical methods like the 5x2 Cross Validation (CV) test are also used to determine if the differences were statistically significant.
The results of this project suggest that catch22 could potentially perform better than tsfresh on macro-weighted F-scores in general. Additionally, both pipelines seem to perform well in terms of class-balanced accuracy on certain types and categories of time-series data. Unfortunately, these performance differences are not fully supported by the results of the 5x2cv test. More experiments with a wider variety of datasets must be conducted before a definitive conclusion can be made. Also, catch22 seems to take at least multiple times longer than tsfresh in extracting and selecting features. However, given that executing the feature engineering pipeline is a one-off process, it is still likely to be worthwhile to use catch22 if it provides a more comprehensive feature set, especially if F-score is the metric to be used. |
---|