Concurrent data stream mining

Given the characteristics of streaming data---read-once only and infinitely streaming, it is desirable to perform multiple, concurrent types of mining on streaming data to the fullest extent permitted by resource constraints. However, to the best of our knowledge, conventional stream mining algorit...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Wenwen.
Other Authors: Ng Wee Keong
Format: Final Year Project
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/16935
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Given the characteristics of streaming data---read-once only and infinitely streaming, it is desirable to perform multiple, concurrent types of mining on streaming data to the fullest extent permitted by resource constraints. However, to the best of our knowledge, conventional stream mining algorithms focused on single, standalone mining. In this report, we made an attempt to achieve concurrent classification and clustering on streaming data. Our integrated framework---the MM-Stream---follows conventional online-offline approaches in stream mining. We describe our framework in general by dividing it into two components, online component and offline component: as data stream in, the online component completes all necessary process within constant time and drops data; as a mining request is issued by the user(s), the offline component performs the mining task(s) from the information collected by online component. We implemented and evaluated the algorithm. The performance evaluation showed that the performance of MM-Stream is comparable or better than existing standalone stream mining algorithms (D-Stream, On-Demand-Stream classifier). We investigated how the performance of such integrated mining compares with the purity and accuracy of standalone clustering and classification respectively. The results showed that, with concurrent mining, we can receive almost-double throughputs, without any degrade on the quality on mining results. We believe that our successful experimentation in MM-Stream's concurrent stream classification and clustering paves the way for incorporating more concurrent data mining tasks to maximize the outputs of stream data mining.