Toolkits development for high dimensional data analysis

In an effort to keep up with the fast growth of World Wide Web, data analysis has become a widely used and necessary aspect of the web usage. Many web document data analysis toolkits have been developed. These toolkits can be used to increase the accuracy and efficiency for the users to find the rel...

Full description

Saved in:
Bibliographic Details
Main Author: Lin, Si Jie.
Other Authors: Chen Lihui
Format: Final Year Project
Language:English
Published: 2010
Subjects:
Online Access:http://hdl.handle.net/10356/40890
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In an effort to keep up with the fast growth of World Wide Web, data analysis has become a widely used and necessary aspect of the web usage. Many web document data analysis toolkits have been developed. These toolkits can be used to increase the accuracy and efficiency for the users to find the relevant information they want from the internet. This report mainly consists of four parts that corresponds to four high dimensional data analysis toolkits designed and developed for various purposes. In the first part, data analysis toolkits with different document representation models and clustering methods are developed. In the second part, some evaluation toolkits are developed. In the third part, the data extraction toolkits based on the MEAD system are developed. Additionally, adding additional functions into an existing system called iSEARCH, a search system with returned results in a clustered way. In this report, the design and implement of each part based on the requirements will be explained. The performance of each system is evaluated by the standard evaluation metrics. The report concludes with the objective achieved along with some recommendations for future development.