Concurrent data stream mining

Given the characteristics of streaming data---read-once only and infinitely streaming, it is desirable to perform multiple, concurrent types of mining on streaming data to the fullest extent permitted by resource constraints. However, to the best of our knowledge, conventional stream mining algorit...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Wenwen.
Other Authors: Ng Wee Keong
Format: Final Year Project
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/16935
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-16935
record_format dspace
spelling sg-ntu-dr.10356-169352023-03-03T20:46:00Z Concurrent data stream mining Wang, Wenwen. Ng Wee Keong School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems::Database management Given the characteristics of streaming data---read-once only and infinitely streaming, it is desirable to perform multiple, concurrent types of mining on streaming data to the fullest extent permitted by resource constraints. However, to the best of our knowledge, conventional stream mining algorithms focused on single, standalone mining. In this report, we made an attempt to achieve concurrent classification and clustering on streaming data. Our integrated framework---the MM-Stream---follows conventional online-offline approaches in stream mining. We describe our framework in general by dividing it into two components, online component and offline component: as data stream in, the online component completes all necessary process within constant time and drops data; as a mining request is issued by the user(s), the offline component performs the mining task(s) from the information collected by online component. We implemented and evaluated the algorithm. The performance evaluation showed that the performance of MM-Stream is comparable or better than existing standalone stream mining algorithms (D-Stream, On-Demand-Stream classifier). We investigated how the performance of such integrated mining compares with the purity and accuracy of standalone clustering and classification respectively. The results showed that, with concurrent mining, we can receive almost-double throughputs, without any degrade on the quality on mining results. We believe that our successful experimentation in MM-Stream's concurrent stream classification and clustering paves the way for incorporating more concurrent data mining tasks to maximize the outputs of stream data mining. Bachelor of Engineering (Computer Engineering) 2009-05-29T02:06:36Z 2009-05-29T02:06:36Z 2009 2009 Final Year Project (FYP) http://hdl.handle.net/10356/16935 en Nanyang Technological University 77 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Database management
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Database management
Wang, Wenwen.
Concurrent data stream mining
description Given the characteristics of streaming data---read-once only and infinitely streaming, it is desirable to perform multiple, concurrent types of mining on streaming data to the fullest extent permitted by resource constraints. However, to the best of our knowledge, conventional stream mining algorithms focused on single, standalone mining. In this report, we made an attempt to achieve concurrent classification and clustering on streaming data. Our integrated framework---the MM-Stream---follows conventional online-offline approaches in stream mining. We describe our framework in general by dividing it into two components, online component and offline component: as data stream in, the online component completes all necessary process within constant time and drops data; as a mining request is issued by the user(s), the offline component performs the mining task(s) from the information collected by online component. We implemented and evaluated the algorithm. The performance evaluation showed that the performance of MM-Stream is comparable or better than existing standalone stream mining algorithms (D-Stream, On-Demand-Stream classifier). We investigated how the performance of such integrated mining compares with the purity and accuracy of standalone clustering and classification respectively. The results showed that, with concurrent mining, we can receive almost-double throughputs, without any degrade on the quality on mining results. We believe that our successful experimentation in MM-Stream's concurrent stream classification and clustering paves the way for incorporating more concurrent data mining tasks to maximize the outputs of stream data mining.
author2 Ng Wee Keong
author_facet Ng Wee Keong
Wang, Wenwen.
format Final Year Project
author Wang, Wenwen.
author_sort Wang, Wenwen.
title Concurrent data stream mining
title_short Concurrent data stream mining
title_full Concurrent data stream mining
title_fullStr Concurrent data stream mining
title_full_unstemmed Concurrent data stream mining
title_sort concurrent data stream mining
publishDate 2009
url http://hdl.handle.net/10356/16935
_version_ 1759856279784783872