Modelling kernel methods for unsupervised learning of micro array data
Unsupervised learning, mostly represented by data clustering methods, is an important machine learning technique. Data clustering analysis has been extensively applied to extract information from microarray gene expression data. However, finding good quality clusters in gene expression data is more...
Saved in:
Main Author: | |
---|---|
Format: | Monograph |
Language: | English |
Published: |
Faculty of Computer Science and Information System
2008
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/5818/1/78096.pdf http://eprints.utm.my/id/eprint/5818/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Malaysia |
Language: | English |
Summary: | Unsupervised learning, mostly represented by data clustering methods, is an important machine learning technique. Data clustering analysis has been extensively applied to extract information from microarray gene expression data. However, finding good quality clusters in gene expression data is more challenging because of its peculiar characteristics such as non-linear separability, outliers, high dimensionality, and diverse structures. Therefore, this study aims at combining kernel methods, capable of both handling the high dimensionality and discovering nonlinear relationships in the data, with the approximate reasoning offered by fuzzy approach. To this end, a robust Weighted Kernel Fuzzy C-Means incorporating local approximation (WKFCM) is presented. In WKFCM, fuzzy membership of each object is approximated from the memberships of its neighbouring objects. It brings in the synergy of partitioning and density based clustering approaches and provides a substantial improvement in the analysis of the data using unsupervised learning. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self organizing maps showed that, although different types of datasets are better partitioned by different algorithms, WKFCM displays the best overall performance, and has the ability to capture nonlinear relationships and non-globular clusters, and identify cluster outliers. |
---|