Incremental clustering methods and their applications on large data analysis
Large data becomes prevalent nowadays because huge amounts of data can be collected easily every day. In this thesis, we propose three new incremental clustering approaches for large relational and multi-view data analysis based on fuzzy clustering and a ffinity propagation based clustering. In i...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/66240 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Large data becomes prevalent nowadays because huge amounts of data can be collected
easily every day. In this thesis, we propose three new incremental clustering approaches
for large relational and multi-view data analysis based on fuzzy clustering and a ffinity propagation based clustering.
In incremental clustering framework, the key idea is to find some representatives to
represent each cluster in a data chunk and the final data analysis is carried out based
on all the identfi ed representatives. In order to represent the structure of a cluster
more accurately and improve the clustering performance, using multiple representatives
to represent each cluster and exploring multi-view features are two directions
considered in our work. The first proposed approach is called incremental multiple
medoids based fuzzy clustering (IMMFC). In IMMFC, multiple representatives referred
as medoids are identifi ed for each cluster in a data chunk by introducing a
weight for each object, which measures how well the object represents the cluster
in a chunk. We also propose the mechanism to make use of some relationships of
identifi ed medoids in each chunk as side information to guide the generation of the
final set of medoids. The second approach is called incremental multiple exemplars
a ffinity propagation (IMEAP) which is a affi nity propagation based approach. The
multiple representatives referred as exemplars are identifi ed based on our new proposed
approach called K-MEAP. K-MEAP is insensitive to the initialization and able to handle
non-symmetric relational data. Inheriting the advantages of K-MEAP, IMEAP is
able to handle large non-symmetric relational data. Our third approach called incremental
multi-view fuzzy clustering based on minimax optimization (IMinimaxFCM) is
proposed to handle large multi-view vector data. In IMinimaxFCM, the identi fication
of representatives referred as centroids is based on our new proposed approach called
MinimaxFCM. In MinimaxFCM the consensus clustering results is generated based
on minimax optimization in which the maximum disagreements of diff erent weighted
views are minimized. Each data object in IMinimaxFCM has di fferent views and
the fi nal set of multi-view centroids to represent the entire data set is achieved by
clustering the identifi ed centroids of diff erent views from all the chunks. |
---|