Adaptive scaling of cluster boundaries for large-scale social media data clustering

The large scale and complex nature of social media data raises the need to scale clustering techniques to big data and make them capable of automatically identifying data clusters with few empirical settings. In this paper, we present our investigation and three algorithms based on the fuzzy adaptiv...

Full description

Saved in:
Bibliographic Details
Main Authors: MENG, Lei, TAN, Ah-hwee, WUNSCH, Donald C.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2015
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/5235
https://ink.library.smu.edu.sg/context/sis_research/article/6238/viewcontent/Adaptive_Scaling_of_Cluster_Boundaries___TNNLS_2016_Preprint.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-6238
record_format dspace
spelling sg-smu-ink.sis_research-62382020-07-23T18:26:36Z Adaptive scaling of cluster boundaries for large-scale social media data clustering MENG, Lei TAN, Ah-hwee WUNSCH, Donald C. The large scale and complex nature of social media data raises the need to scale clustering techniques to big data and make them capable of automatically identifying data clusters with few empirical settings. In this paper, we present our investigation and three algorithms based on the fuzzy adaptive resonance theory (Fuzzy ART) that have linear computational complexity, use a single parameter, i.e., the vigilance parameter to identify data clusters, and are robust to modest parameter settings. The contribution of this paper lies in two aspects. First, we theoretically demonstrate how complement coding, commonly known as a normalization method, changes the clustering mechanism of Fuzzy ART, and discover the vigilance region (VR) that essentially determines how a cluster in the Fuzzy ART system recognizes similar patterns in the feature space. The VR gives an intrinsic interpretation of the clustering mechanism and limitations of Fuzzy ART. Second, we introduce the idea of allowing different clusters in the Fuzzy ART system to have different vigilance levels in order to meet the diverse nature of the pattern distribution of social media data. To this end, we propose three vigilance adaptation methods, namely, the activation maximization (AM) rule, the confliction minimization (CM) rule, and the hybrid integration (HI) rule. With an initial vigilance value, the resulting clustering algorithms, namely, the AM-ART, CM-ART, and HI-ART, can automatically adapt the vigilance values of all clusters during the learning epochs in order to produce better cluster boundaries. Experiments on four social media data sets show that AM-ART, CM-ART, and HI-ART are more robust than Fuzzy ART to the initial vigilance value, and they usually achieve better or comparable performance and much faster speed than the state-of-the-art clustering algorithms that also do not require a predefined number of clusters. 2015-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5235 info:doi/10.1109/TNNLS.2015.2498625 https://ink.library.smu.edu.sg/context/sis_research/article/6238/viewcontent/Adaptive_Scaling_of_Cluster_Boundaries___TNNLS_2016_Preprint.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Clustering Big social media data Adaptive Resonance Theory Vigilance region Adaptive parameter tuning Computer and Systems Architecture Databases and Information Systems OS and Networks
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Clustering
Big social media data
Adaptive Resonance Theory
Vigilance region
Adaptive parameter tuning
Computer and Systems Architecture
Databases and Information Systems
OS and Networks
spellingShingle Clustering
Big social media data
Adaptive Resonance Theory
Vigilance region
Adaptive parameter tuning
Computer and Systems Architecture
Databases and Information Systems
OS and Networks
MENG, Lei
TAN, Ah-hwee
WUNSCH, Donald C.
Adaptive scaling of cluster boundaries for large-scale social media data clustering
description The large scale and complex nature of social media data raises the need to scale clustering techniques to big data and make them capable of automatically identifying data clusters with few empirical settings. In this paper, we present our investigation and three algorithms based on the fuzzy adaptive resonance theory (Fuzzy ART) that have linear computational complexity, use a single parameter, i.e., the vigilance parameter to identify data clusters, and are robust to modest parameter settings. The contribution of this paper lies in two aspects. First, we theoretically demonstrate how complement coding, commonly known as a normalization method, changes the clustering mechanism of Fuzzy ART, and discover the vigilance region (VR) that essentially determines how a cluster in the Fuzzy ART system recognizes similar patterns in the feature space. The VR gives an intrinsic interpretation of the clustering mechanism and limitations of Fuzzy ART. Second, we introduce the idea of allowing different clusters in the Fuzzy ART system to have different vigilance levels in order to meet the diverse nature of the pattern distribution of social media data. To this end, we propose three vigilance adaptation methods, namely, the activation maximization (AM) rule, the confliction minimization (CM) rule, and the hybrid integration (HI) rule. With an initial vigilance value, the resulting clustering algorithms, namely, the AM-ART, CM-ART, and HI-ART, can automatically adapt the vigilance values of all clusters during the learning epochs in order to produce better cluster boundaries. Experiments on four social media data sets show that AM-ART, CM-ART, and HI-ART are more robust than Fuzzy ART to the initial vigilance value, and they usually achieve better or comparable performance and much faster speed than the state-of-the-art clustering algorithms that also do not require a predefined number of clusters.
format text
author MENG, Lei
TAN, Ah-hwee
WUNSCH, Donald C.
author_facet MENG, Lei
TAN, Ah-hwee
WUNSCH, Donald C.
author_sort MENG, Lei
title Adaptive scaling of cluster boundaries for large-scale social media data clustering
title_short Adaptive scaling of cluster boundaries for large-scale social media data clustering
title_full Adaptive scaling of cluster boundaries for large-scale social media data clustering
title_fullStr Adaptive scaling of cluster boundaries for large-scale social media data clustering
title_full_unstemmed Adaptive scaling of cluster boundaries for large-scale social media data clustering
title_sort adaptive scaling of cluster boundaries for large-scale social media data clustering
publisher Institutional Knowledge at Singapore Management University
publishDate 2015
url https://ink.library.smu.edu.sg/sis_research/5235
https://ink.library.smu.edu.sg/context/sis_research/article/6238/viewcontent/Adaptive_Scaling_of_Cluster_Boundaries___TNNLS_2016_Preprint.pdf
_version_ 1770575344332439552