Clustering techniques for web mining

With more and more high-dimensional data becoming prevalent, feature selection has been widely applied in data mining, machine learning and some other fields. The goal of feature selection is removing unneeded features because they might degrade the quality of discovered patterns. As a result, data...

Full description

Saved in:
Bibliographic Details
Main Author: Qiu, Siyuan.
Other Authors: Chen Lihui
Format: Final Year Project
Language:English
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/10356/50226
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-50226
record_format dspace
spelling sg-ntu-dr.10356-502262023-07-07T16:39:40Z Clustering techniques for web mining Qiu, Siyuan. Chen Lihui School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems With more and more high-dimensional data becoming prevalent, feature selection has been widely applied in data mining, machine learning and some other fields. The goal of feature selection is removing unneeded features because they might degrade the quality of discovered patterns. As a result, data mining process can be applied much quicker and more accurately. Various feature selection approaches in text categorization have been proposed in the literature. In this project, a Multitype Features Coselection for Web Document Clustering (MFCC) approach has been researched and implemented. MFCC is designed to improve identifying the most discriminative and remove the noisy features. In this project, other than the implementation of MFCC, we have also done the data processing which transforms the raw web documents to the format that can be used in MFCC JAVA program. Afterwards, several simulations have been conducted to test the accuracy and efficiency of MFCC. Bachelor of Engineering 2012-05-31T03:02:30Z 2012-05-31T03:02:30Z 2012 2012 Final Year Project (FYP) http://hdl.handle.net/10356/50226 en Nanyang Technological University 72 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Qiu, Siyuan.
Clustering techniques for web mining
description With more and more high-dimensional data becoming prevalent, feature selection has been widely applied in data mining, machine learning and some other fields. The goal of feature selection is removing unneeded features because they might degrade the quality of discovered patterns. As a result, data mining process can be applied much quicker and more accurately. Various feature selection approaches in text categorization have been proposed in the literature. In this project, a Multitype Features Coselection for Web Document Clustering (MFCC) approach has been researched and implemented. MFCC is designed to improve identifying the most discriminative and remove the noisy features. In this project, other than the implementation of MFCC, we have also done the data processing which transforms the raw web documents to the format that can be used in MFCC JAVA program. Afterwards, several simulations have been conducted to test the accuracy and efficiency of MFCC.
author2 Chen Lihui
author_facet Chen Lihui
Qiu, Siyuan.
format Final Year Project
author Qiu, Siyuan.
author_sort Qiu, Siyuan.
title Clustering techniques for web mining
title_short Clustering techniques for web mining
title_full Clustering techniques for web mining
title_fullStr Clustering techniques for web mining
title_full_unstemmed Clustering techniques for web mining
title_sort clustering techniques for web mining
publishDate 2012
url http://hdl.handle.net/10356/50226
_version_ 1772829090551693312