Incremental clustering methods and their applications on large data analysis

Large data becomes prevalent nowadays because huge amounts of data can be collected easily every day. In this thesis, we propose three new incremental clustering approaches for large relational and multi-view data analysis based on fuzzy clustering and a ffinity propagation based clustering. In i...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Yangtao
Other Authors: Chen Lihui
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/66240
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-66240
record_format dspace
spelling sg-ntu-dr.10356-662402023-07-04T16:12:27Z Incremental clustering methods and their applications on large data analysis Wang, Yangtao Chen Lihui School of Electrical and Electronic Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Large data becomes prevalent nowadays because huge amounts of data can be collected easily every day. In this thesis, we propose three new incremental clustering approaches for large relational and multi-view data analysis based on fuzzy clustering and a ffinity propagation based clustering. In incremental clustering framework, the key idea is to find some representatives to represent each cluster in a data chunk and the final data analysis is carried out based on all the identfi ed representatives. In order to represent the structure of a cluster more accurately and improve the clustering performance, using multiple representatives to represent each cluster and exploring multi-view features are two directions considered in our work. The first proposed approach is called incremental multiple medoids based fuzzy clustering (IMMFC). In IMMFC, multiple representatives referred as medoids are identifi ed for each cluster in a data chunk by introducing a weight for each object, which measures how well the object represents the cluster in a chunk. We also propose the mechanism to make use of some relationships of identifi ed medoids in each chunk as side information to guide the generation of the final set of medoids. The second approach is called incremental multiple exemplars a ffinity propagation (IMEAP) which is a affi nity propagation based approach. The multiple representatives referred as exemplars are identifi ed based on our new proposed approach called K-MEAP. K-MEAP is insensitive to the initialization and able to handle non-symmetric relational data. Inheriting the advantages of K-MEAP, IMEAP is able to handle large non-symmetric relational data. Our third approach called incremental multi-view fuzzy clustering based on minimax optimization (IMinimaxFCM) is proposed to handle large multi-view vector data. In IMinimaxFCM, the identi fication of representatives referred as centroids is based on our new proposed approach called MinimaxFCM. In MinimaxFCM the consensus clustering results is generated based on minimax optimization in which the maximum disagreements of diff erent weighted views are minimized. Each data object in IMinimaxFCM has di fferent views and the fi nal set of multi-view centroids to represent the entire data set is achieved by clustering the identifi ed centroids of diff erent views from all the chunks. Doctor of Philosophy (EEE) 2016-03-21T06:59:41Z 2016-03-21T06:59:41Z 2016 Thesis Wang, Y. (2016). Incremental clustering methods and their applications on large data analysis. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/66240 en 173 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Wang, Yangtao
Incremental clustering methods and their applications on large data analysis
description Large data becomes prevalent nowadays because huge amounts of data can be collected easily every day. In this thesis, we propose three new incremental clustering approaches for large relational and multi-view data analysis based on fuzzy clustering and a ffinity propagation based clustering. In incremental clustering framework, the key idea is to find some representatives to represent each cluster in a data chunk and the final data analysis is carried out based on all the identfi ed representatives. In order to represent the structure of a cluster more accurately and improve the clustering performance, using multiple representatives to represent each cluster and exploring multi-view features are two directions considered in our work. The first proposed approach is called incremental multiple medoids based fuzzy clustering (IMMFC). In IMMFC, multiple representatives referred as medoids are identifi ed for each cluster in a data chunk by introducing a weight for each object, which measures how well the object represents the cluster in a chunk. We also propose the mechanism to make use of some relationships of identifi ed medoids in each chunk as side information to guide the generation of the final set of medoids. The second approach is called incremental multiple exemplars a ffinity propagation (IMEAP) which is a affi nity propagation based approach. The multiple representatives referred as exemplars are identifi ed based on our new proposed approach called K-MEAP. K-MEAP is insensitive to the initialization and able to handle non-symmetric relational data. Inheriting the advantages of K-MEAP, IMEAP is able to handle large non-symmetric relational data. Our third approach called incremental multi-view fuzzy clustering based on minimax optimization (IMinimaxFCM) is proposed to handle large multi-view vector data. In IMinimaxFCM, the identi fication of representatives referred as centroids is based on our new proposed approach called MinimaxFCM. In MinimaxFCM the consensus clustering results is generated based on minimax optimization in which the maximum disagreements of diff erent weighted views are minimized. Each data object in IMinimaxFCM has di fferent views and the fi nal set of multi-view centroids to represent the entire data set is achieved by clustering the identifi ed centroids of diff erent views from all the chunks.
author2 Chen Lihui
author_facet Chen Lihui
Wang, Yangtao
format Theses and Dissertations
author Wang, Yangtao
author_sort Wang, Yangtao
title Incremental clustering methods and their applications on large data analysis
title_short Incremental clustering methods and their applications on large data analysis
title_full Incremental clustering methods and their applications on large data analysis
title_fullStr Incremental clustering methods and their applications on large data analysis
title_full_unstemmed Incremental clustering methods and their applications on large data analysis
title_sort incremental clustering methods and their applications on large data analysis
publishDate 2016
url http://hdl.handle.net/10356/66240
_version_ 1772828423872315392