Analysis on the impact of feature reduction on time-series data

Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have expl...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Ernest Yong En
Other Authors:	A S Madhukumar
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Science::Mathematics::Applied mathematics::Data visualization Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/137889
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-137889
record_format	dspace
spelling	sg-ntu-dr.10356-1378892020-04-17T05:47:40Z Analysis on the impact of feature reduction on time-series data Tan, Ernest Yong En A S Madhukumar School of Computer Science and Engineering asmadhukumar@ntu.edu.sg Science::Mathematics::Applied mathematics::Data visualization Engineering::Computer science and engineering Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have explored automated feature engineering techniques to automate this process. This project analyzes the most recent advances in automated feature engineering for time-series data classification, focusing specifically on two techniques: catch22 and tsfresh. The project implements both pipelines in Python on a local machine and analyzes their performance in terms of class-balanced accuracy, macro-weighted F-score and computation time. Statistical methods like the 5x2 Cross Validation (CV) test are also used to determine if the differences were statistically significant. The results of this project suggest that catch22 could potentially perform better than tsfresh on macro-weighted F-scores in general. Additionally, both pipelines seem to perform well in terms of class-balanced accuracy on certain types and categories of time-series data. Unfortunately, these performance differences are not fully supported by the results of the 5x2cv test. More experiments with a wider variety of datasets must be conducted before a definitive conclusion can be made. Also, catch22 seems to take at least multiple times longer than tsfresh in extracting and selecting features. However, given that executing the feature engineering pipeline is a one-off process, it is still likely to be worthwhile to use catch22 if it provides a more comprehensive feature set, especially if F-score is the metric to be used. Bachelor of Engineering (Computer Science) 2020-04-17T05:47:40Z 2020-04-17T05:47:40Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137889 en SCSE19-0363 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Science::Mathematics::Applied mathematics::Data visualization Engineering::Computer science and engineering
spellingShingle	Science::Mathematics::Applied mathematics::Data visualization Engineering::Computer science and engineering Tan, Ernest Yong En Analysis on the impact of feature reduction on time-series data
description	Time-Series Classification has many practical applications in areas like industrial process monitoring and anomaly detection. However, the sheer volume and size of the datasets often make it difficult to conduct any analysis without first performing dimensionality reduction. Recent studies have explored automated feature engineering techniques to automate this process. This project analyzes the most recent advances in automated feature engineering for time-series data classification, focusing specifically on two techniques: catch22 and tsfresh. The project implements both pipelines in Python on a local machine and analyzes their performance in terms of class-balanced accuracy, macro-weighted F-score and computation time. Statistical methods like the 5x2 Cross Validation (CV) test are also used to determine if the differences were statistically significant. The results of this project suggest that catch22 could potentially perform better than tsfresh on macro-weighted F-scores in general. Additionally, both pipelines seem to perform well in terms of class-balanced accuracy on certain types and categories of time-series data. Unfortunately, these performance differences are not fully supported by the results of the 5x2cv test. More experiments with a wider variety of datasets must be conducted before a definitive conclusion can be made. Also, catch22 seems to take at least multiple times longer than tsfresh in extracting and selecting features. However, given that executing the feature engineering pipeline is a one-off process, it is still likely to be worthwhile to use catch22 if it provides a more comprehensive feature set, especially if F-score is the metric to be used.
author2	A S Madhukumar
author_facet	A S Madhukumar Tan, Ernest Yong En
format	Final Year Project
author	Tan, Ernest Yong En
author_sort	Tan, Ernest Yong En
title	Analysis on the impact of feature reduction on time-series data
title_short	Analysis on the impact of feature reduction on time-series data
title_full	Analysis on the impact of feature reduction on time-series data
title_fullStr	Analysis on the impact of feature reduction on time-series data
title_full_unstemmed	Analysis on the impact of feature reduction on time-series data
title_sort	analysis on the impact of feature reduction on time-series data
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/137889
_version_	1681058558193434624

Analysis on the impact of feature reduction on time-series data

Similar Items