Toolkits development for high dimensional data analysis
In an effort to keep up with the fast growth of World Wide Web, data analysis has become a widely used and necessary aspect of the web usage. Many web document data analysis toolkits have been developed. These toolkits can be used to increase the accuracy and efficiency for the users to find the rel...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2010
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/40890 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In an effort to keep up with the fast growth of World Wide Web, data analysis has become a widely used and necessary aspect of the web usage. Many web document data analysis toolkits have been developed. These toolkits can be used to increase the accuracy and efficiency for the users to find the relevant information they want from the internet.
This report mainly consists of four parts that corresponds to four high dimensional data analysis toolkits designed and developed for various purposes. In the first part, data analysis toolkits with different document representation models and clustering methods are developed. In the second part, some evaluation toolkits are developed. In the third part, the data extraction toolkits based on the MEAD system are developed. Additionally, adding additional functions into an existing system called iSEARCH, a search system with returned results in a clustered way.
In this report, the design and implement of each part based on the requirements will be explained. The performance of each system is evaluated by the standard evaluation metrics. The report concludes with the objective achieved along with some recommendations for future development. |
---|