Data stream mining

The data stream mining problem has been studied extensively in recent years, due to the greatease in collection of stream data. The essential to a data stream mining algorithms is that we can only read data once. Unfortunately, most of traditional data mining algorithms do not have such single-scan...

Full description

Saved in:
Bibliographic Details
Main Author: Wan, Li
Other Authors: Ng Wee Keong
Format: Final Year Project
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/17010
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-17010
record_format dspace
spelling sg-ntu-dr.10356-170102023-03-03T20:52:14Z Data stream mining Wan, Li Ng Wee Keong School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems::Database management The data stream mining problem has been studied extensively in recent years, due to the greatease in collection of stream data. The essential to a data stream mining algorithms is that we can only read data once. Unfortunately, most of traditional data mining algorithms do not have such single-scan property. Usually, data stream is considered as semi-in¯nite. It is impossible to store all the past data with limited resources. Thus, mining high dimensional data streams is a challenging task. In this report, we are going to propose some interesting observations on feature quality stream(FQS), which is obtained from data stream in real time, and a frame- work to analyze such stream. The analysis results of FQS are used to reduce the dimension of data streams. We will also propose a data stream mining framework called MR-Stream. It is a e±cient data stream clustering framework with the following properties: (1) computes and updates synopsis information in constant time; (2) allows users to discover clusters at multiple resolutions; (3) determines the right time for users to generate clusters from the synopsis in- formation; (4) generates clusters of higher purity than existing algorithms; and (5) determines the right threshold function for density-based clustering based on the fading model of stream data. MR-Stream can be extend to solve classi¯cation problem. The classi¯cation results ob- tained from the online component of MR-Stream framework are in realtime. The result given by MR-Stream is presented as a probability distribution table over di®erent classes. Bachelor of Engineering (Computer Engineering) 2009-05-29T03:45:00Z 2009-05-29T03:45:00Z 2009 2009 Final Year Project (FYP) http://hdl.handle.net/10356/17010 en Nanyang Technological University 59 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Database management
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Database management
Wan, Li
Data stream mining
description The data stream mining problem has been studied extensively in recent years, due to the greatease in collection of stream data. The essential to a data stream mining algorithms is that we can only read data once. Unfortunately, most of traditional data mining algorithms do not have such single-scan property. Usually, data stream is considered as semi-in¯nite. It is impossible to store all the past data with limited resources. Thus, mining high dimensional data streams is a challenging task. In this report, we are going to propose some interesting observations on feature quality stream(FQS), which is obtained from data stream in real time, and a frame- work to analyze such stream. The analysis results of FQS are used to reduce the dimension of data streams. We will also propose a data stream mining framework called MR-Stream. It is a e±cient data stream clustering framework with the following properties: (1) computes and updates synopsis information in constant time; (2) allows users to discover clusters at multiple resolutions; (3) determines the right time for users to generate clusters from the synopsis in- formation; (4) generates clusters of higher purity than existing algorithms; and (5) determines the right threshold function for density-based clustering based on the fading model of stream data. MR-Stream can be extend to solve classi¯cation problem. The classi¯cation results ob- tained from the online component of MR-Stream framework are in realtime. The result given by MR-Stream is presented as a probability distribution table over di®erent classes.
author2 Ng Wee Keong
author_facet Ng Wee Keong
Wan, Li
format Final Year Project
author Wan, Li
author_sort Wan, Li
title Data stream mining
title_short Data stream mining
title_full Data stream mining
title_fullStr Data stream mining
title_full_unstemmed Data stream mining
title_sort data stream mining
publishDate 2009
url http://hdl.handle.net/10356/17010
_version_ 1759855784525561856