Analysis on the impact of feature reduction on time-series data

Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have expl...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Ernest Yong En
Other Authors: A S Madhukumar
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137889
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-137889
record_format dspace
spelling sg-ntu-dr.10356-1378892020-04-17T05:47:40Z Analysis on the impact of feature reduction on time-series data Tan, Ernest Yong En A S Madhukumar School of Computer Science and Engineering asmadhukumar@ntu.edu.sg Science::Mathematics::Applied mathematics::Data visualization Engineering::Computer science and engineering Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have explored automated feature engineering techniques to automate this process. This project analyzes the most recent advances in automated feature engineering for time-series data classification, focusing specifically on two techniques: catch22 and tsfresh. The project implements both pipelines in Python on a local machine and analyzes their performance in terms of class-balanced accuracy, macro-weighted F-score and computation time. Statistical methods like the 5x2 Cross Validation (CV) test are also used to determine if the differences were statistically significant. The results of this project suggest that catch22 could potentially perform better than tsfresh on macro-weighted F-scores in general. Additionally, both pipelines seem to perform well in terms of class-balanced accuracy on certain types and categories of time-series data. Unfortunately, these performance differences are not fully supported by the results of the 5x2cv test. More experiments with a wider variety of datasets must be conducted before a definitive conclusion can be made. Also, catch22 seems to take at least multiple times longer than tsfresh in extracting and selecting features. However, given that executing the feature engineering pipeline is a one-off process, it is still likely to be worthwhile to use catch22 if it provides a more comprehensive feature set, especially if F-score is the metric to be used. Bachelor of Engineering (Computer Science) 2020-04-17T05:47:40Z 2020-04-17T05:47:40Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137889 en SCSE19-0363 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Science::Mathematics::Applied mathematics::Data visualization
Engineering::Computer science and engineering
spellingShingle Science::Mathematics::Applied mathematics::Data visualization
Engineering::Computer science and engineering
Tan, Ernest Yong En
Analysis on the impact of feature reduction on time-series data
description Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have explored automated feature engineering techniques to automate this process. This project analyzes the most recent advances in automated feature engineering for time-series data classification, focusing specifically on two techniques: catch22 and tsfresh. The project implements both pipelines in Python on a local machine and analyzes their performance in terms of class-balanced accuracy, macro-weighted F-score and computation time. Statistical methods like the 5x2 Cross Validation (CV) test are also used to determine if the differences were statistically significant. The results of this project suggest that catch22 could potentially perform better than tsfresh on macro-weighted F-scores in general. Additionally, both pipelines seem to perform well in terms of class-balanced accuracy on certain types and categories of time-series data. Unfortunately, these performance differences are not fully supported by the results of the 5x2cv test. More experiments with a wider variety of datasets must be conducted before a definitive conclusion can be made. Also, catch22 seems to take at least multiple times longer than tsfresh in extracting and selecting features. However, given that executing the feature engineering pipeline is a one-off process, it is still likely to be worthwhile to use catch22 if it provides a more comprehensive feature set, especially if F-score is the metric to be used.
author2 A S Madhukumar
author_facet A S Madhukumar
Tan, Ernest Yong En
format Final Year Project
author Tan, Ernest Yong En
author_sort Tan, Ernest Yong En
title Analysis on the impact of feature reduction on time-series data
title_short Analysis on the impact of feature reduction on time-series data
title_full Analysis on the impact of feature reduction on time-series data
title_fullStr Analysis on the impact of feature reduction on time-series data
title_full_unstemmed Analysis on the impact of feature reduction on time-series data
title_sort analysis on the impact of feature reduction on time-series data
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/137889
_version_ 1681058558193434624