Robust models and novel similarity measures for high-dimensional data clustering

The purpose of this thesis is to present our research works on some of the fundamental issues encountered in high-dimensional data clustering. From our study of the current literature, we list out a few important problems that are still open for solutions in the field, and propose the appropriate so...

Full description

Saved in:
Bibliographic Details
Main Author: Nguyen, Duc Thang
Other Authors: Chan Chee Keong
Format: Theses and Dissertations
Language:English
Published: 2012
Subjects:
Online Access:https://hdl.handle.net/10356/48657
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The purpose of this thesis is to present our research works on some of the fundamental issues encountered in high-dimensional data clustering. From our study of the current literature, we list out a few important problems that are still open for solutions in the field, and propose the appropriate solutions for these problems. We investigate how statistics, machine learning and meta-heuristics techniques can be used to improve existing methods or develop novel models for unsupervised learning of high-dimensional data. Our goals are to develop efficient clustering algorithms that could reflect the natural properties of high-dimensional data, be robust to outliers and less sensitive to initialization; algorithm that are simple and fast, easily applicable and still produce good clustering quality. The main contributions of this thesis include a robust model-based clustering algorithm which is capable of handling noisy data, a novel similarity measure and its resulted algorithms for clustering text document data, and other related studies to help improve existing clustering algorithms.