High dimensional clustering for mixture models

Clustering is an essential subject in unsupervised learning. It is a common technique used in many fields, including machine learning, statistics, bioinformatics, and computer graphics. Classifying samples into homogeneous groups is based on different criterions. In this thesis, we focus on the clu...

Full description

Saved in:

Bibliographic Details
Main Author:	Liu, Yiming
Other Authors:	Pan Guangming
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Science::Mathematics
Online Access:	https://hdl.handle.net/10356/142941
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-142941
record_format	dspace
spelling	sg-ntu-dr.10356-1429412023-02-28T23:33:16Z High dimensional clustering for mixture models Liu, Yiming Pan Guangming School of Physical and Mathematical Sciences GMPAN@ntu.edu.sg Science::Mathematics Clustering is an essential subject in unsupervised learning. It is a common technique used in many fields, including machine learning, statistics, bioinformatics, and computer graphics. Classifying samples into homogeneous groups is based on different criterions. In this thesis, we focus on the clusters that are characterized by the different parameters (i.e., means and covariances), and we study the clustering method for the high dimensional mixture data. According to this setting, we propose two new methods, Covariance clustering method and {\it Two-step} method. Also, we investigate and develop the Mean clustering method from both theoretical and practical aspects by random matrix theory. Specifically, the first part focuses on the clustering when the data are collected from a mixture distribution with distinct covariance matrices. We provide a new algorithm to address this issue and find the misclustering rate theoretically. In the second part, for the data with different means, we provide a noncentered and centered version of Mean clustering method. Moreover, to give a theoretical justification of these two methods, we prove that the results of no eigenvalue outside the support of the limiting spectral distribution and exact separation of eigenvalues of large-dimensional sample covariance matrices can be extended to low rank information plus general noise models. In the third part, when either means or covariances are distinct, we propose a Two-step method to do clustering. Both theoretical and numerical properties of the Two-step method are discussed. Simulation studies and real data analysis also demonstrate that the Two-step method outperforms the other methods under a variety of settings. Doctor of Philosophy 2020-07-14T07:27:43Z 2020-07-14T07:27:43Z 2020 Thesis-Doctor of Philosophy Liu, Y. (2020). High dimensional clustering for mixture models. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/142941 10.32657/10356/142941 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Science::Mathematics
spellingShingle	Science::Mathematics Liu, Yiming High dimensional clustering for mixture models
description	Clustering is an essential subject in unsupervised learning. It is a common technique used in many fields, including machine learning, statistics, bioinformatics, and computer graphics. Classifying samples into homogeneous groups is based on different criterions. In this thesis, we focus on the clusters that are characterized by the different parameters (i.e., means and covariances), and we study the clustering method for the high dimensional mixture data. According to this setting, we propose two new methods, Covariance clustering method and {\it Two-step} method. Also, we investigate and develop the Mean clustering method from both theoretical and practical aspects by random matrix theory. Specifically, the first part focuses on the clustering when the data are collected from a mixture distribution with distinct covariance matrices. We provide a new algorithm to address this issue and find the misclustering rate theoretically. In the second part, for the data with different means, we provide a noncentered and centered version of Mean clustering method. Moreover, to give a theoretical justification of these two methods, we prove that the results of no eigenvalue outside the support of the limiting spectral distribution and exact separation of eigenvalues of large-dimensional sample covariance matrices can be extended to low rank information plus general noise models. In the third part, when either means or covariances are distinct, we propose a Two-step method to do clustering. Both theoretical and numerical properties of the Two-step method are discussed. Simulation studies and real data analysis also demonstrate that the Two-step method outperforms the other methods under a variety of settings.
author2	Pan Guangming
author_facet	Pan Guangming Liu, Yiming
format	Thesis-Doctor of Philosophy
author	Liu, Yiming
author_sort	Liu, Yiming
title	High dimensional clustering for mixture models
title_short	High dimensional clustering for mixture models
title_full	High dimensional clustering for mixture models
title_fullStr	High dimensional clustering for mixture models
title_full_unstemmed	High dimensional clustering for mixture models
title_sort	high dimensional clustering for mixture models
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/142941
_version_	1759853325275103232

High dimensional clustering for mixture models

Similar Items