Development of data mining and knowledge discovery system

This project implemented a data mining web application powered by open sourced data mining package such as Weka and Mahout. Traditionally users who want to use those softwares must download and setup on their own machine. This method has two drawbacks: firstly, the setup process can be quite involve...

Full description

Saved in:
Bibliographic Details
Main Author: Pham, Tuong Minh
Other Authors: Hoi Chu Hong
Format: Final Year Project
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/59125
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This project implemented a data mining web application powered by open sourced data mining package such as Weka and Mahout. Traditionally users who want to use those softwares must download and setup on their own machine. This method has two drawbacks: firstly, the setup process can be quite involved and secondly, the user cannot process large data set using a single machine. This calls for a need for a data mining web application that is ready to be used by the user, provides a friendly user interface and leverages on the wide range of capabilities and scalability of open source packages. There exists some websites with the same purpose such as BigML, however these websites offer very limited number of data mining algorithms. This project has implemented a data mining solution comprised of 1) Weka - a data mining package developed by University of Waikato, 2) Apache Mahout - a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms and 3) A user friendly web interface. Through this project, we have found that the it is very promising to combine the capabilities of these softwares into a web application. The system can be further improved by 1) allowing the user to interact with the result of the data mining process, 2) migrate the CSS framework to the latest version, 3) include a notification system and 4) include more type of plots for visualization.