Multiple centers based fuzzy clustering for imbalanced data
Clustering for data mining is a useful technique in terms of identifying interesting distributions and discovering groups in the underlying data. K-means is a particular clustering technique that is world-renowned and widely spread for its low computational cost, which mainly includes the hard k-mea...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/68529 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Clustering for data mining is a useful technique in terms of identifying interesting distributions and discovering groups in the underlying data. K-means is a particular clustering technique that is world-renowned and widely spread for its low computational cost, which mainly includes the hard k-means clustering algorithms and the fuzzy k-means clustering algorithms.
There are many factors that may affect the performance of the k-means clustering algorithms, such as high dimensionality, scales of the data, noise, etc. And the data distribution is also an important factor that can affect the performance of the k-means clustering algorithm significantly, not only for the hard k-means clustering, but also for the fuzzy k-means clustering. The problem caused by the imbalanced data is also called the “uniform effect”.
In this thesis, the multicenter clustering algorithm (MC) [6] has been studied and implemented, which aims to solve “uniform effect”. The MC clustering algorithm contains three sub algorithms, which are the fast global fuzzy k-mean algorithm (FGFKM), the best m-plot algorithm (BMP) and the grouping multicenter algorithm (GMC). The experimental study of the MC, and its three sub-algorithms has been conducted, and the performance of the algorithms is evaluated. Comparisons between MC and its related algorithms have been made using several datasets. |
---|