A proximity-based fuzzy clustering for web mining

Fuzzy C Means Clustering (FCM) is one of the fundamental clustering techniques, which has been widely used for image processing in clustering objects in past 30 years. However, FCM inevitably has some shortcomings, for example it did not take the users' habits or preferences into consideration....

Full description

Saved in:

Bibliographic Details
Main Author:	Zhang, Tao
Other Authors:	Chen Lihui
Format:	Theses and Dissertations
Language:	English
Published:	2014
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	http://hdl.handle.net/10356/55253
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-55253
record_format	dspace
spelling	sg-ntu-dr.10356-552532023-07-04T15:39:42Z A proximity-based fuzzy clustering for web mining Zhang, Tao Chen Lihui School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Fuzzy C Means Clustering (FCM) is one of the fundamental clustering techniques, which has been widely used for image processing in clustering objects in past 30 years. However, FCM inevitably has some shortcomings, for example it did not take the users' habits or preferences into consideration. Meanwhile, with regard to the web search service, there has been a substantial gap between what users expect and what users actually get. Thus, we employ a method called P-FCM, which is a proximity-based fuzzy C-means proposed by W. Pedrycz et al. in 2003[4]. As the name stipulates, the supervision mechanism is realized with a certain number of proximity hints or constraints provided by the users, which specify an extent to which these pairs of pattern are regarded relevant or different. These hints can be considered as a kind of prior knowledge to the clustering process, and externally drive the optimization process into two steps. The first phase comes the standard Fuzzy C-means, and the second phase is the gradient-driven optimization of the differences between the proximity constraints and those computed based on the partition matrix obtained at the first phase of the algorithm. Afterwards, we put forward an improved method, the modified P-HFCM, which uses cosine distance instead of Euclidean distance to represent the relationship between documents. We simulate two examples of small datasets illustrated in W. Pedrycz's paper by Java and Matlab separately. Besides we observe the performance of P-HFCM (E-Distance) and modified P-HFCM (C-Distance) on several high dimensional datasets with different parameter settings. We set up a series of evaluation methods to measure the behavior of the clustering results compared with the predefined ground truth from various respects and analyze the effects on the clustering results produced by adjusting varying parameters. Master of Science (Signal Processing) 2014-01-07T04:43:23Z 2014-01-07T04:43:23Z 2013 2013 Thesis http://hdl.handle.net/10356/55253 en 105 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering Zhang, Tao A proximity-based fuzzy clustering for web mining
description	Fuzzy C Means Clustering (FCM) is one of the fundamental clustering techniques, which has been widely used for image processing in clustering objects in past 30 years. However, FCM inevitably has some shortcomings, for example it did not take the users' habits or preferences into consideration. Meanwhile, with regard to the web search service, there has been a substantial gap between what users expect and what users actually get. Thus, we employ a method called P-FCM, which is a proximity-based fuzzy C-means proposed by W. Pedrycz et al. in 2003[4]. As the name stipulates, the supervision mechanism is realized with a certain number of proximity hints or constraints provided by the users, which specify an extent to which these pairs of pattern are regarded relevant or different. These hints can be considered as a kind of prior knowledge to the clustering process, and externally drive the optimization process into two steps. The first phase comes the standard Fuzzy C-means, and the second phase is the gradient-driven optimization of the differences between the proximity constraints and those computed based on the partition matrix obtained at the first phase of the algorithm. Afterwards, we put forward an improved method, the modified P-HFCM, which uses cosine distance instead of Euclidean distance to represent the relationship between documents. We simulate two examples of small datasets illustrated in W. Pedrycz's paper by Java and Matlab separately. Besides we observe the performance of P-HFCM (E-Distance) and modified P-HFCM (C-Distance) on several high dimensional datasets with different parameter settings. We set up a series of evaluation methods to measure the behavior of the clustering results compared with the predefined ground truth from various respects and analyze the effects on the clustering results produced by adjusting varying parameters.
author2	Chen Lihui
author_facet	Chen Lihui Zhang, Tao
format	Theses and Dissertations
author	Zhang, Tao
author_sort	Zhang, Tao
title	A proximity-based fuzzy clustering for web mining
title_short	A proximity-based fuzzy clustering for web mining
title_full	A proximity-based fuzzy clustering for web mining
title_fullStr	A proximity-based fuzzy clustering for web mining
title_full_unstemmed	A proximity-based fuzzy clustering for web mining
title_sort	proximity-based fuzzy clustering for web mining
publishDate	2014
url	http://hdl.handle.net/10356/55253
_version_	1772827176821850112

A proximity-based fuzzy clustering for web mining

Similar Items