Rough set approach for categorical data clustering
A few techniques of rough categorical data clustering exist to group objects having similar characteristics. However, the performance of the techniques is an issue due to low accuracy, high computational complexity and clusters purity. This work proposes a new technique called Maximum Dependen...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
2010
|
Subjects: | |
Online Access: | http://eprints.uthm.edu.my/3609/1/24p%20TUTUT%20HERAWAN.pdf http://eprints.uthm.edu.my/3609/2/TUTUT%20HERAWAN%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/3609/3/TUTUT%20HERAWAN%20WATERMARK.pdf http://eprints.uthm.edu.my/3609/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Tun Hussein Onn Malaysia |
Language: | English English English |
id |
my.uthm.eprints.3609 |
---|---|
record_format |
eprints |
spelling |
my.uthm.eprints.36092022-02-03T01:53:46Z http://eprints.uthm.edu.my/3609/ Rough set approach for categorical data clustering Herawan, Tutut QA Mathematics QA71-90 Instruments and machines A few techniques of rough categorical data clustering exist to group objects having similar characteristics. However, the performance of the techniques is an issue due to low accuracy, high computational complexity and clusters purity. This work proposes a new technique called Maximum Dependency Attributes (MDA) to improve the previous techniques due to these issues. The proposed technique is based on rough set theory by taking into account the dependency of attributes of an information system. The main contribution of this technique is to introduce a new technique to classify objects from categorical datasets which has better performance as compared to the baseline techniques. The algorithm of the proposed technique is implemented in MATLAB® version 7.6.0.324 (R2008a). They are executed sequentially on a processor Intel Core 2 Duo CPUs. The total main memory is 1 Gigabyte and the operating system is Windows XP Professional SP3. Results collected during the experiments on four small datasets and thirteen UCI benchmark datasets for selecting a clustering attribute show that the proposed MDA technique is an efficient approach in terms of accuracy and computational complexity as compared to BC, TR and MMR techniques. For the clusters purity, the results on Soybean and Zoo datasets show that MDA technique provided better purity up to 17% and 9%, respectively. The experimental result on supplier chain management clustering also demonstrates how MDA technique can contribute to practical system and establish the better performance for computation complexity and clusters purity up to 90% and 23%, respectively. 2010-03 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/3609/1/24p%20TUTUT%20HERAWAN.pdf text en http://eprints.uthm.edu.my/3609/2/TUTUT%20HERAWAN%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/3609/3/TUTUT%20HERAWAN%20WATERMARK.pdf Herawan, Tutut (2010) Rough set approach for categorical data clustering. Doctoral thesis, Universiti Tun Hussein Malaysia. |
institution |
Universiti Tun Hussein Onn Malaysia |
building |
UTHM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Tun Hussein Onn Malaysia |
content_source |
UTHM Institutional Repository |
url_provider |
http://eprints.uthm.edu.my/ |
language |
English English English |
topic |
QA Mathematics QA71-90 Instruments and machines |
spellingShingle |
QA Mathematics QA71-90 Instruments and machines Herawan, Tutut Rough set approach for categorical data clustering |
description |
A few techniques of rough categorical data clustering exist to group objects
having similar characteristics. However, the performance of the techniques is an
issue due to low accuracy, high computational complexity and clusters purity.
This work proposes a new technique called Maximum Dependency Attributes
(MDA) to improve the previous techniques due to these issues. The proposed
technique is based on rough set theory by taking into account the dependency of
attributes of an information system. The main contribution of this technique is to
introduce a new technique to classify objects from categorical datasets which has
better performance as compared to the baseline techniques.
The algorithm of the proposed technique is implemented in MATLAB®
version 7.6.0.324 (R2008a). They are executed sequentially on a processor Intel Core
2 Duo CPUs. The total main memory is 1 Gigabyte and the operating system is
Windows XP Professional SP3. Results collected during the experiments on four
small datasets and thirteen UCI benchmark datasets for selecting a clustering
attribute show that the proposed MDA technique is an efficient approach in terms of
accuracy and computational complexity as compared to BC, TR and MMR
techniques. For the clusters purity, the results on Soybean and Zoo datasets show that
MDA technique provided better purity up to 17% and 9%, respectively.
The experimental result on supplier chain management clustering also
demonstrates how MDA technique can contribute to practical system and establish
the better performance for computation complexity and clusters purity up to 90% and
23%, respectively. |
format |
Thesis |
author |
Herawan, Tutut |
author_facet |
Herawan, Tutut |
author_sort |
Herawan, Tutut |
title |
Rough set approach for categorical data clustering |
title_short |
Rough set approach for categorical data clustering |
title_full |
Rough set approach for categorical data clustering |
title_fullStr |
Rough set approach for categorical data clustering |
title_full_unstemmed |
Rough set approach for categorical data clustering |
title_sort |
rough set approach for categorical data clustering |
publishDate |
2010 |
url |
http://eprints.uthm.edu.my/3609/1/24p%20TUTUT%20HERAWAN.pdf http://eprints.uthm.edu.my/3609/2/TUTUT%20HERAWAN%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/3609/3/TUTUT%20HERAWAN%20WATERMARK.pdf http://eprints.uthm.edu.my/3609/ |
_version_ |
1738581145226838016 |