A proximity-based fuzzy clustering for web mining

Fuzzy C Means Clustering (FCM) is one of the fundamental clustering techniques, which has been widely used for image processing in clustering objects in past 30 years. However, FCM inevitably has some shortcomings, for example it did not take the users' habits or preferences into consideration....

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Tao
Other Authors: Chen Lihui
Format: Theses and Dissertations
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/55253
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-55253
record_format dspace
spelling sg-ntu-dr.10356-552532023-07-04T15:39:42Z A proximity-based fuzzy clustering for web mining Zhang, Tao Chen Lihui School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Fuzzy C Means Clustering (FCM) is one of the fundamental clustering techniques, which has been widely used for image processing in clustering objects in past 30 years. However, FCM inevitably has some shortcomings, for example it did not take the users' habits or preferences into consideration. Meanwhile, with regard to the web search service, there has been a substantial gap between what users expect and what users actually get. Thus, we employ a method called P-FCM, which is a proximity-based fuzzy C-means proposed by W. Pedrycz et al. in 2003[4]. As the name stipulates, the supervision mechanism is realized with a certain number of proximity hints or constraints provided by the users, which specify an extent to which these pairs of pattern are regarded relevant or different. These hints can be considered as a kind of prior knowledge to the clustering process, and externally drive the optimization process into two steps. The first phase comes the standard Fuzzy C-means, and the second phase is the gradient-driven optimization of the differences between the proximity constraints and those computed based on the partition matrix obtained at the first phase of the algorithm. Afterwards, we put forward an improved method, the modified P-HFCM, which uses cosine distance instead of Euclidean distance to represent the relationship between documents. We simulate two examples of small datasets illustrated in W. Pedrycz's paper by Java and Matlab separately. Besides we observe the performance of P-HFCM (E-Distance) and modified P-HFCM (C-Distance) on several high dimensional datasets with different parameter settings. We set up a series of evaluation methods to measure the behavior of the clustering results compared with the predefined ground truth from various respects and analyze the effects on the clustering results produced by adjusting varying parameters. Master of Science (Signal Processing) 2014-01-07T04:43:23Z 2014-01-07T04:43:23Z 2013 2013 Thesis http://hdl.handle.net/10356/55253 en 105 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Zhang, Tao
A proximity-based fuzzy clustering for web mining
description Fuzzy C Means Clustering (FCM) is one of the fundamental clustering techniques, which has been widely used for image processing in clustering objects in past 30 years. However, FCM inevitably has some shortcomings, for example it did not take the users' habits or preferences into consideration. Meanwhile, with regard to the web search service, there has been a substantial gap between what users expect and what users actually get. Thus, we employ a method called P-FCM, which is a proximity-based fuzzy C-means proposed by W. Pedrycz et al. in 2003[4]. As the name stipulates, the supervision mechanism is realized with a certain number of proximity hints or constraints provided by the users, which specify an extent to which these pairs of pattern are regarded relevant or different. These hints can be considered as a kind of prior knowledge to the clustering process, and externally drive the optimization process into two steps. The first phase comes the standard Fuzzy C-means, and the second phase is the gradient-driven optimization of the differences between the proximity constraints and those computed based on the partition matrix obtained at the first phase of the algorithm. Afterwards, we put forward an improved method, the modified P-HFCM, which uses cosine distance instead of Euclidean distance to represent the relationship between documents. We simulate two examples of small datasets illustrated in W. Pedrycz's paper by Java and Matlab separately. Besides we observe the performance of P-HFCM (E-Distance) and modified P-HFCM (C-Distance) on several high dimensional datasets with different parameter settings. We set up a series of evaluation methods to measure the behavior of the clustering results compared with the predefined ground truth from various respects and analyze the effects on the clustering results produced by adjusting varying parameters.
author2 Chen Lihui
author_facet Chen Lihui
Zhang, Tao
format Theses and Dissertations
author Zhang, Tao
author_sort Zhang, Tao
title A proximity-based fuzzy clustering for web mining
title_short A proximity-based fuzzy clustering for web mining
title_full A proximity-based fuzzy clustering for web mining
title_fullStr A proximity-based fuzzy clustering for web mining
title_full_unstemmed A proximity-based fuzzy clustering for web mining
title_sort proximity-based fuzzy clustering for web mining
publishDate 2014
url http://hdl.handle.net/10356/55253
_version_ 1772827176821850112