Incremental clustering methods and their applications on large data analysis
Large data becomes prevalent nowadays because huge amounts of data can be collected easily every day. In this thesis, we propose three new incremental clustering approaches for large relational and multi-view data analysis based on fuzzy clustering and a ffinity propagation based clustering. In i...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/66240 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-66240 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-662402023-07-04T16:12:27Z Incremental clustering methods and their applications on large data analysis Wang, Yangtao Chen Lihui School of Electrical and Electronic Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Large data becomes prevalent nowadays because huge amounts of data can be collected easily every day. In this thesis, we propose three new incremental clustering approaches for large relational and multi-view data analysis based on fuzzy clustering and a ffinity propagation based clustering. In incremental clustering framework, the key idea is to find some representatives to represent each cluster in a data chunk and the final data analysis is carried out based on all the identfi ed representatives. In order to represent the structure of a cluster more accurately and improve the clustering performance, using multiple representatives to represent each cluster and exploring multi-view features are two directions considered in our work. The first proposed approach is called incremental multiple medoids based fuzzy clustering (IMMFC). In IMMFC, multiple representatives referred as medoids are identifi ed for each cluster in a data chunk by introducing a weight for each object, which measures how well the object represents the cluster in a chunk. We also propose the mechanism to make use of some relationships of identifi ed medoids in each chunk as side information to guide the generation of the final set of medoids. The second approach is called incremental multiple exemplars a ffinity propagation (IMEAP) which is a affi nity propagation based approach. The multiple representatives referred as exemplars are identifi ed based on our new proposed approach called K-MEAP. K-MEAP is insensitive to the initialization and able to handle non-symmetric relational data. Inheriting the advantages of K-MEAP, IMEAP is able to handle large non-symmetric relational data. Our third approach called incremental multi-view fuzzy clustering based on minimax optimization (IMinimaxFCM) is proposed to handle large multi-view vector data. In IMinimaxFCM, the identi fication of representatives referred as centroids is based on our new proposed approach called MinimaxFCM. In MinimaxFCM the consensus clustering results is generated based on minimax optimization in which the maximum disagreements of diff erent weighted views are minimized. Each data object in IMinimaxFCM has di fferent views and the fi nal set of multi-view centroids to represent the entire data set is achieved by clustering the identifi ed centroids of diff erent views from all the chunks. Doctor of Philosophy (EEE) 2016-03-21T06:59:41Z 2016-03-21T06:59:41Z 2016 Thesis Wang, Y. (2016). Incremental clustering methods and their applications on large data analysis. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/66240 en 173 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Wang, Yangtao Incremental clustering methods and their applications on large data analysis |
description |
Large data becomes prevalent nowadays because huge amounts of data can be collected
easily every day. In this thesis, we propose three new incremental clustering approaches
for large relational and multi-view data analysis based on fuzzy clustering and a ffinity propagation based clustering.
In incremental clustering framework, the key idea is to find some representatives to
represent each cluster in a data chunk and the final data analysis is carried out based
on all the identfi ed representatives. In order to represent the structure of a cluster
more accurately and improve the clustering performance, using multiple representatives
to represent each cluster and exploring multi-view features are two directions
considered in our work. The first proposed approach is called incremental multiple
medoids based fuzzy clustering (IMMFC). In IMMFC, multiple representatives referred
as medoids are identifi ed for each cluster in a data chunk by introducing a
weight for each object, which measures how well the object represents the cluster
in a chunk. We also propose the mechanism to make use of some relationships of
identifi ed medoids in each chunk as side information to guide the generation of the
final set of medoids. The second approach is called incremental multiple exemplars
a ffinity propagation (IMEAP) which is a affi nity propagation based approach. The
multiple representatives referred as exemplars are identifi ed based on our new proposed
approach called K-MEAP. K-MEAP is insensitive to the initialization and able to handle
non-symmetric relational data. Inheriting the advantages of K-MEAP, IMEAP is
able to handle large non-symmetric relational data. Our third approach called incremental
multi-view fuzzy clustering based on minimax optimization (IMinimaxFCM) is
proposed to handle large multi-view vector data. In IMinimaxFCM, the identi fication
of representatives referred as centroids is based on our new proposed approach called
MinimaxFCM. In MinimaxFCM the consensus clustering results is generated based
on minimax optimization in which the maximum disagreements of diff erent weighted
views are minimized. Each data object in IMinimaxFCM has di fferent views and
the fi nal set of multi-view centroids to represent the entire data set is achieved by
clustering the identifi ed centroids of diff erent views from all the chunks. |
author2 |
Chen Lihui |
author_facet |
Chen Lihui Wang, Yangtao |
format |
Theses and Dissertations |
author |
Wang, Yangtao |
author_sort |
Wang, Yangtao |
title |
Incremental clustering methods and their applications on large data analysis |
title_short |
Incremental clustering methods and their applications on large data analysis |
title_full |
Incremental clustering methods and their applications on large data analysis |
title_fullStr |
Incremental clustering methods and their applications on large data analysis |
title_full_unstemmed |
Incremental clustering methods and their applications on large data analysis |
title_sort |
incremental clustering methods and their applications on large data analysis |
publishDate |
2016 |
url |
http://hdl.handle.net/10356/66240 |
_version_ |
1772828423872315392 |