Advanced classification for streaming time series and data streams

Nowadays, overwhelming volumes of sequential data are very common in scientific and business applications, such as biomedicine, stock markets, retail industry, and communication networks. Time series and data streams are the two most popular types of sequential data. The main difference between them...

Full description

Saved in:

Bibliographic Details
Main Author:	Nguyen, Hai Long
Other Authors:	Ng Wee Keong
Format:	Theses and Dissertations
Language:	English
Published:	2013
Subjects:	DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
Online Access:	https://hdl.handle.net/10356/54815
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-54815
record_format	dspace
spelling	sg-ntu-dr.10356-548152023-03-04T00:47:30Z Advanced classification for streaming time series and data streams Nguyen, Hai Long Ng Wee Keong School of Computer Engineering EADS Innovation Works South Asia Economic Development Board of Singapore Centre for Advanced Information Systems Woon Yew Kwong DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications Nowadays, overwhelming volumes of sequential data are very common in scientific and business applications, such as biomedicine, stock markets, retail industry, and communication networks. Time series and data streams are the two most popular types of sequential data. The main difference between them is that time series is on a single variable domain, while data streams are generally on a multivariate domain. However, they do share some unique characteristics: possibly infinite volume, time-ordered and dynamically changing. In this dissertation, we propose classification algorithms for time series and data streams that satisfy strict constraints, such as bounded memory, single pass, real-time response, and concept-drift detection. Here, a concept drift refers to the situation where the data's underlying distribution changes over time. For massive time series datasets, classification algorithms that are based on motifs (frequent subsequences) are preferable since it not only has low complexity but can also achieve high accuracy. However, state-of-the-art algorithms can only find motifs with a predefined length, which greatly affects their performance and practicality. To overcome this challenge, we introduce the notion of a closed motif; a motif is closed if there is no motif with a longer length having the same number of occurrences. We also propose a novel closed-motif-based classifier, which is lightweight, effective and efficient for time series classification. Furthermore, we continue to examine a more challenging problem of classifying data streams in a multivariate domain. Here, we are confronted with a feature drift problem, where the importance/relevance of a set of features will change over time. We propose a general framework to integrate feature selection and heterogeneous ensemble learning, which is able to adapt to different types of concept drifts and works well with various kinds of datasets. The ensemble consists of well-chosen online classifiers and is equipped with an optimal weighting method. It updates online classifier members for gradual drifts, and replace outdated members by new ones for feature drifts. Additionally, we extend our algorithms in a practical environment, where labeled data is very scarce and there is a need for the concurrent mining of data streams in order to make full use of the single-pass data. Conventional stream mining algorithms only focus on stand-alone mining tasks. Therefore, we propose an incremental algorithm that performs clustering and classification concurrently, which not only maximize throughput, but also achieve better mining results. Moreover, enhanced with a novel active learning technique, our algorithm only requires a small number of queries to work well with very sparsely labeled data streams. Finally, as the volume of sequential data grows steadily, a single computer with limited computing power may soon be insufficient for the mining processes. Cloud computing, a cutting-edge technology that provides elastic computing on demand, will certainly facilitate large sequential data mining. Therefore, we plan to adapt and migrate our algorithms to a cloud computing platform in the future. DOCTOR OF PHILOSOPHY (SCE) 2013-08-27T09:14:07Z 2013-08-27T09:14:07Z 2012 2012 Thesis Nguyen, H. L. (2012). Advanced classification for streaming time series and data streams. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/54815 10.32657/10356/54815 en 167 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
spellingShingle	DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications Nguyen, Hai Long Advanced classification for streaming time series and data streams
description	Nowadays, overwhelming volumes of sequential data are very common in scientific and business applications, such as biomedicine, stock markets, retail industry, and communication networks. Time series and data streams are the two most popular types of sequential data. The main difference between them is that time series is on a single variable domain, while data streams are generally on a multivariate domain. However, they do share some unique characteristics: possibly infinite volume, time-ordered and dynamically changing. In this dissertation, we propose classification algorithms for time series and data streams that satisfy strict constraints, such as bounded memory, single pass, real-time response, and concept-drift detection. Here, a concept drift refers to the situation where the data's underlying distribution changes over time. For massive time series datasets, classification algorithms that are based on motifs (frequent subsequences) are preferable since it not only has low complexity but can also achieve high accuracy. However, state-of-the-art algorithms can only find motifs with a predefined length, which greatly affects their performance and practicality. To overcome this challenge, we introduce the notion of a closed motif; a motif is closed if there is no motif with a longer length having the same number of occurrences. We also propose a novel closed-motif-based classifier, which is lightweight, effective and efficient for time series classification. Furthermore, we continue to examine a more challenging problem of classifying data streams in a multivariate domain. Here, we are confronted with a feature drift problem, where the importance/relevance of a set of features will change over time. We propose a general framework to integrate feature selection and heterogeneous ensemble learning, which is able to adapt to different types of concept drifts and works well with various kinds of datasets. The ensemble consists of well-chosen online classifiers and is equipped with an optimal weighting method. It updates online classifier members for gradual drifts, and replace outdated members by new ones for feature drifts. Additionally, we extend our algorithms in a practical environment, where labeled data is very scarce and there is a need for the concurrent mining of data streams in order to make full use of the single-pass data. Conventional stream mining algorithms only focus on stand-alone mining tasks. Therefore, we propose an incremental algorithm that performs clustering and classification concurrently, which not only maximize throughput, but also achieve better mining results. Moreover, enhanced with a novel active learning technique, our algorithm only requires a small number of queries to work well with very sparsely labeled data streams. Finally, as the volume of sequential data grows steadily, a single computer with limited computing power may soon be insufficient for the mining processes. Cloud computing, a cutting-edge technology that provides elastic computing on demand, will certainly facilitate large sequential data mining. Therefore, we plan to adapt and migrate our algorithms to a cloud computing platform in the future.
author2	Ng Wee Keong
author_facet	Ng Wee Keong Nguyen, Hai Long
format	Theses and Dissertations
author	Nguyen, Hai Long
author_sort	Nguyen, Hai Long
title	Advanced classification for streaming time series and data streams
title_short	Advanced classification for streaming time series and data streams
title_full	Advanced classification for streaming time series and data streams
title_fullStr	Advanced classification for streaming time series and data streams
title_full_unstemmed	Advanced classification for streaming time series and data streams
title_sort	advanced classification for streaming time series and data streams
publishDate	2013
url	https://hdl.handle.net/10356/54815
_version_	1759853038327037952

Advanced classification for streaming time series and data streams

Similar Items