Analysis on the impact of feature reduction on time-series data
Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have expl...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/137889 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-137889 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1378892020-04-17T05:47:40Z Analysis on the impact of feature reduction on time-series data Tan, Ernest Yong En A S Madhukumar School of Computer Science and Engineering asmadhukumar@ntu.edu.sg Science::Mathematics::Applied mathematics::Data visualization Engineering::Computer science and engineering Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have explored automated feature engineering techniques to automate this process. This project analyzes the most recent advances in automated feature engineering for time-series data classification, focusing specifically on two techniques: catch22 and tsfresh. The project implements both pipelines in Python on a local machine and analyzes their performance in terms of class-balanced accuracy, macro-weighted F-score and computation time. Statistical methods like the 5x2 Cross Validation (CV) test are also used to determine if the differences were statistically significant. The results of this project suggest that catch22 could potentially perform better than tsfresh on macro-weighted F-scores in general. Additionally, both pipelines seem to perform well in terms of class-balanced accuracy on certain types and categories of time-series data. Unfortunately, these performance differences are not fully supported by the results of the 5x2cv test. More experiments with a wider variety of datasets must be conducted before a definitive conclusion can be made. Also, catch22 seems to take at least multiple times longer than tsfresh in extracting and selecting features. However, given that executing the feature engineering pipeline is a one-off process, it is still likely to be worthwhile to use catch22 if it provides a more comprehensive feature set, especially if F-score is the metric to be used. Bachelor of Engineering (Computer Science) 2020-04-17T05:47:40Z 2020-04-17T05:47:40Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137889 en SCSE19-0363 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Science::Mathematics::Applied mathematics::Data visualization Engineering::Computer science and engineering |
spellingShingle |
Science::Mathematics::Applied mathematics::Data visualization Engineering::Computer science and engineering Tan, Ernest Yong En Analysis on the impact of feature reduction on time-series data |
description |
Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have explored automated feature engineering techniques to automate this process.
This project analyzes the most recent advances in automated feature engineering for time-series data classification, focusing specifically on two techniques: catch22 and tsfresh. The project implements both pipelines in Python on a local machine and analyzes their performance in terms of class-balanced accuracy, macro-weighted F-score and computation time. Statistical methods like the 5x2 Cross Validation (CV) test are also used to determine if the differences were statistically significant.
The results of this project suggest that catch22 could potentially perform better than tsfresh on macro-weighted F-scores in general. Additionally, both pipelines seem to perform well in terms of class-balanced accuracy on certain types and categories of time-series data. Unfortunately, these performance differences are not fully supported by the results of the 5x2cv test. More experiments with a wider variety of datasets must be conducted before a definitive conclusion can be made. Also, catch22 seems to take at least multiple times longer than tsfresh in extracting and selecting features. However, given that executing the feature engineering pipeline is a one-off process, it is still likely to be worthwhile to use catch22 if it provides a more comprehensive feature set, especially if F-score is the metric to be used. |
author2 |
A S Madhukumar |
author_facet |
A S Madhukumar Tan, Ernest Yong En |
format |
Final Year Project |
author |
Tan, Ernest Yong En |
author_sort |
Tan, Ernest Yong En |
title |
Analysis on the impact of feature reduction on time-series data |
title_short |
Analysis on the impact of feature reduction on time-series data |
title_full |
Analysis on the impact of feature reduction on time-series data |
title_fullStr |
Analysis on the impact of feature reduction on time-series data |
title_full_unstemmed |
Analysis on the impact of feature reduction on time-series data |
title_sort |
analysis on the impact of feature reduction on time-series data |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/137889 |
_version_ |
1681058558193434624 |