Spatial data clustering with boundary detection
To deal with various types of data from diverse areas, where the amount of data is still increasing dramatically, data mining has become one of the fastest growing fields in computer industry in last decade. Data mining aims at discovery of knowledge implicitly stored in information repositories. Am...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2010
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/41852 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | To deal with various types of data from diverse areas, where the amount of data is still increasing dramatically, data mining has become one of the fastest growing fields in computer industry in last decade. Data mining aims at discovery of knowledge implicitly stored in information repositories. Among all data mining tasks, clustering is an approach to distinguish different groups of data (clusters) without given class labels. It is also named unsupervised learning in artificial intelligence (AI) research field. In this study, we focus on clustering methods for analyzing spatial data, especially on low dimensional data. Applications in large spatial databases, multimedia, point-based graphics, etc brought new requirements for spatial clustering such as automatic discovering of arbitrary shaped and/or non-homogeneous clusters, detecting various types of outliers and building cluster boundaries, etc. To fulfill such new requirements generated from different spatial applications we proposed and implemented three novel algorithms. Our research is concentrated on 2D algorithms implementation. In the study when the connectivity between data points is the most concern, we propose novel clustering algorithms based on specially constructed adaptive functions. The adaptive function-based clustering method applies functions as influence functions to simulate a relationship between data points in data set. Different features of the whole data set can be described by the field function which is built by the sum of influence functions of each data point. The clustering procedure is executed according to threshold value of the field function. The introduced adaptive parameters of influence functions are employed to localize the field function based on both local and global data distribution. |
---|