Multiple centers based fuzzy clustering for imbalanced data

Clustering for data mining is a useful technique in terms of identifying interesting distributions and discovering groups in the underlying data. K-means is a particular clustering technique that is world-renowned and widely spread for its low computational cost, which mainly includes the hard k-mea...

Full description

Saved in:
Bibliographic Details
Main Author: Liao, Hongda
Other Authors: Chen Lihui
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/68529
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Clustering for data mining is a useful technique in terms of identifying interesting distributions and discovering groups in the underlying data. K-means is a particular clustering technique that is world-renowned and widely spread for its low computational cost, which mainly includes the hard k-means clustering algorithms and the fuzzy k-means clustering algorithms. There are many factors that may affect the performance of the k-means clustering algorithms, such as high dimensionality, scales of the data, noise, etc. And the data distribution is also an important factor that can affect the performance of the k-means clustering algorithm significantly, not only for the hard k-means clustering, but also for the fuzzy k-means clustering. The problem caused by the imbalanced data is also called the “uniform effect”. In this thesis, the multicenter clustering algorithm (MC) [6] has been studied and implemented, which aims to solve “uniform effect”. The MC clustering algorithm contains three sub algorithms, which are the fast global fuzzy k-mean algorithm (FGFKM), the best m-plot algorithm (BMP) and the grouping multicenter algorithm (GMC). The experimental study of the MC, and its three sub-algorithms has been conducted, and the performance of the algorithms is evaluated. Comparisons between MC and its related algorithms have been made using several datasets.