Incremental clustering methods and their applications on large data analysis

Large data becomes prevalent nowadays because huge amounts of data can be collected easily every day. In this thesis, we propose three new incremental clustering approaches for large relational and multi-view data analysis based on fuzzy clustering and a ffinity propagation based clustering. In i...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Yangtao
Other Authors: Chen Lihui
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/66240
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Large data becomes prevalent nowadays because huge amounts of data can be collected easily every day. In this thesis, we propose three new incremental clustering approaches for large relational and multi-view data analysis based on fuzzy clustering and a ffinity propagation based clustering. In incremental clustering framework, the key idea is to find some representatives to represent each cluster in a data chunk and the final data analysis is carried out based on all the identfi ed representatives. In order to represent the structure of a cluster more accurately and improve the clustering performance, using multiple representatives to represent each cluster and exploring multi-view features are two directions considered in our work. The first proposed approach is called incremental multiple medoids based fuzzy clustering (IMMFC). In IMMFC, multiple representatives referred as medoids are identifi ed for each cluster in a data chunk by introducing a weight for each object, which measures how well the object represents the cluster in a chunk. We also propose the mechanism to make use of some relationships of identifi ed medoids in each chunk as side information to guide the generation of the final set of medoids. The second approach is called incremental multiple exemplars a ffinity propagation (IMEAP) which is a affi nity propagation based approach. The multiple representatives referred as exemplars are identifi ed based on our new proposed approach called K-MEAP. K-MEAP is insensitive to the initialization and able to handle non-symmetric relational data. Inheriting the advantages of K-MEAP, IMEAP is able to handle large non-symmetric relational data. Our third approach called incremental multi-view fuzzy clustering based on minimax optimization (IMinimaxFCM) is proposed to handle large multi-view vector data. In IMinimaxFCM, the identi fication of representatives referred as centroids is based on our new proposed approach called MinimaxFCM. In MinimaxFCM the consensus clustering results is generated based on minimax optimization in which the maximum disagreements of diff erent weighted views are minimized. Each data object in IMinimaxFCM has di fferent views and the fi nal set of multi-view centroids to represent the entire data set is achieved by clustering the identifi ed centroids of diff erent views from all the chunks.