Empirical analysis of rough set categorical clustering techniques based on rough purity and value set

Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, attention has been put on categorical data clustering, where data objects are made up of non-numerical attributes. The implementation of several existing categorical clustering techniques is c...

Full description

Saved in:

Bibliographic Details
Main Author:	Uddin, Jamal
Format:	Thesis
Language:	English English
Published:	2017
Subjects:	QA76 Computer software
Online Access:	http://eprints.uthm.edu.my/336/1/JAMAL%20UDDIN%20WATERMARK.pdf http://eprints.uthm.edu.my/336/2/24p%20JAMAL%20UDDIN.pdf http://eprints.uthm.edu.my/336/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Tun Hussein Onn Malaysia
Language:	English English

id	my.uthm.eprints.336
record_format	eprints
spelling	my.uthm.eprints.3362021-07-22T07:09:41Z http://eprints.uthm.edu.my/336/ Empirical analysis of rough set categorical clustering techniques based on rough purity and value set Uddin, Jamal QA76 Computer software Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, attention has been put on categorical data clustering, where data objects are made up of non-numerical attributes. The implementation of several existing categorical clustering techniques is challenging as some are unable to handle uncertainty and others have stability issues. In the process of dealing with categorical data and handling uncertainty, the rough set theory has become well-established mechanism in a wide variety of applications including databases. The recent techniques such as Information-Theoretic Dependency Roughness (ITDR), Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR), Min-Min Roughness (MMR), and standard-deviation roughness (SDR). This work explores the limitations and issues of ITDR, MDA and MSA techniques on data sets where these techniques fails to select or faces difficulty in selecting their best clustering attribute. Accordingly, two alternative techniques named Rough Purity Approach (RPA) and Maximum Value Attribute (MVA) are proposed. The novelty of both proposed approaches is that, the RPA presents a new uncertainty definition based on purity of rough relational data base whereas, the MVA unlike other rough set theory techniques uses the domain knowledge such as value set combined with number of clusters (NoC). To show the significance, mathematical and theoretical basis for proposed approaches, several propositions are illustrated. Moreover, the recent rough categorical techniques like MDA, MSA, ITDR and classical clustering technique like simple K-mean are used for comparison and the results are presented in tabular and graphical forms. For experiments, data sets from previously utilized research cases, a real supply base management (SBM) data set and UCI repository are utilized. The results reveal significant improvement by proposed techniques for categorical clustering in terms of purity (21%), entropy (9%), accuracy (16%), rough accuracy (11%), iterations (99%) and time (93%). vii 2017-08 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/336/1/JAMAL%20UDDIN%20WATERMARK.pdf text en http://eprints.uthm.edu.my/336/2/24p%20JAMAL%20UDDIN.pdf Uddin, Jamal (2017) Empirical analysis of rough set categorical clustering techniques based on rough purity and value set. Doctoral thesis, Universiti Tun Hussein Onn Malaysia.
institution	Universiti Tun Hussein Onn Malaysia
building	UTHM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Tun Hussein Onn Malaysia
content_source	UTHM Institutional Repository
url_provider	http://eprints.uthm.edu.my/
language	English English
topic	QA76 Computer software
spellingShingle	QA76 Computer software Uddin, Jamal Empirical analysis of rough set categorical clustering techniques based on rough purity and value set
description	Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, attention has been put on categorical data clustering, where data objects are made up of non-numerical attributes. The implementation of several existing categorical clustering techniques is challenging as some are unable to handle uncertainty and others have stability issues. In the process of dealing with categorical data and handling uncertainty, the rough set theory has become well-established mechanism in a wide variety of applications including databases. The recent techniques such as Information-Theoretic Dependency Roughness (ITDR), Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR), Min-Min Roughness (MMR), and standard-deviation roughness (SDR). This work explores the limitations and issues of ITDR, MDA and MSA techniques on data sets where these techniques fails to select or faces difficulty in selecting their best clustering attribute. Accordingly, two alternative techniques named Rough Purity Approach (RPA) and Maximum Value Attribute (MVA) are proposed. The novelty of both proposed approaches is that, the RPA presents a new uncertainty definition based on purity of rough relational data base whereas, the MVA unlike other rough set theory techniques uses the domain knowledge such as value set combined with number of clusters (NoC). To show the significance, mathematical and theoretical basis for proposed approaches, several propositions are illustrated. Moreover, the recent rough categorical techniques like MDA, MSA, ITDR and classical clustering technique like simple K-mean are used for comparison and the results are presented in tabular and graphical forms. For experiments, data sets from previously utilized research cases, a real supply base management (SBM) data set and UCI repository are utilized. The results reveal significant improvement by proposed techniques for categorical clustering in terms of purity (21%), entropy (9%), accuracy (16%), rough accuracy (11%), iterations (99%) and time (93%). vii
format	Thesis
author	Uddin, Jamal
author_facet	Uddin, Jamal
author_sort	Uddin, Jamal
title	Empirical analysis of rough set categorical clustering techniques based on rough purity and value set
title_short	Empirical analysis of rough set categorical clustering techniques based on rough purity and value set
title_full	Empirical analysis of rough set categorical clustering techniques based on rough purity and value set
title_fullStr	Empirical analysis of rough set categorical clustering techniques based on rough purity and value set
title_full_unstemmed	Empirical analysis of rough set categorical clustering techniques based on rough purity and value set
title_sort	empirical analysis of rough set categorical clustering techniques based on rough purity and value set
publishDate	2017
url	http://eprints.uthm.edu.my/336/1/JAMAL%20UDDIN%20WATERMARK.pdf http://eprints.uthm.edu.my/336/2/24p%20JAMAL%20UDDIN.pdf http://eprints.uthm.edu.my/336/
_version_	1738580723853426688

Empirical analysis of rough set categorical clustering techniques based on rough purity and value set

Similar Items