Clustering and heterogeneous information fusion for social media theme discovery and associative mining

The emergence of social networking web sites has created numerous interactive sharing platforms for users to upload, comment, and share multimedia content online within their social circles. It has led to the massive number of web multimedia documents, together with their rich meta-information, such...

Full description

Saved in:
Bibliographic Details
Main Author: Meng, Lei
Other Authors: Tan Ah Hwee
Format: Theses and Dissertations
Language:English
Published: 2015
Subjects:
Online Access:https://hdl.handle.net/10356/62096
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-62096
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Meng, Lei
Clustering and heterogeneous information fusion for social media theme discovery and associative mining
description The emergence of social networking web sites has created numerous interactive sharing platforms for users to upload, comment, and share multimedia content online within their social circles. It has led to the massive number of web multimedia documents, together with their rich meta-information, such as category information, user tagging and description, and user comments. Such interconnected but heterogeneous social media data has provided opportunities for understanding traditional multimedia data, such as images and text documents. More importantly, the different types of activities and interactions of social users could be utilized to understand and analyze user behaviors, and discover social trends in social networks. Clustering is an important approach to the analysis and mining of social media data. However, different from traditional multimedia data, the social media data are typically massive, diverse, heterogeneous and noisy. Those characteristics of social media data raise new challenges for existing clustering techniques, including the scalability to big data, the ability to automatically recognize the number of clusters in data sets, the strategies to effectively integrate data from heterogeneous resources for clustering, and the robustness to noisy features. Moreover, considering that different social users may have different preferences for categorizing the social media data, incorporating user preferences into the clustering framework to produce personalized data clusters is also a challenge. In order to address the above issues, in this thesis, we investigate and develop novel clustering algorithms for the fast and robust clustering of large-scale social media data by integrating their multiple but different types of features and user preferences, and explore their applications to the associative social media mining tasks. Towards this goal, we have completed four key tasks. First, we developed a two-step semi-supervised hierarchical clustering algorithm, termed Personalized Hierarchical Theme-based Clustering (PHTC), for personalized web image organization by exploiting the surrounding text of web images. Our experiments have shown that PHTC can identify high quality clusters of web images under user supervision using the proposed semi-supervised clustering algorithm, called Probabilistic Fusion Adaptive Resonance Theory (PF-ART). In addition, it can order the clusters into a systematical hierarchy with a higher quality and lower time cost than several existing hierarchical clustering algorithms. Secondly, we proposed a semi-supervised heterogeneous data co-clustering algorithm, termed Generalized Heterogeneous Fusion Adaptive Resonance Theory (GHF-ART), for multimedia data co-clustering by integrating different types of features from inter-related but heterogeneous data resources and user preferences. Compared with existing approaches, GHF-ART has the advantages of strong noise immunity, adaptive feature weighting, low computational cost, and incremental clustering in handling the dynamic social media data. Thirdly, we investigated the feasibility of GHF-ART to clustering social network data for discovering user communities in heterogeneous social networks, and demonstrated its capability for analyzing the correlation among different social links and mining the potential themes of user communities. Lastly, we studied the geometrical dynamics of Fuzzy ART and proposed three methods to adapt the vigilance parameter of Fuzzy ART. This leads to clustering algorithms insensitive to the input parameters for dealing with large and complex social media data. Our experiments have demonstrated the effectiveness of the proposed methods. Furthermore, the geometrical study of Fuzzy ART may also benefit further research. While our completed studies has provided the base technologies for social media mining, the future directions for this thesis may focus on the following aspects: 1) Modeling of short and noisy text; 2) Automated selection of vigilance parameter in Fuzzy ART; 3) Improvement of clustering mechanism of ART; 4) Extension work on multimedia data indexing, annotation and retrieval; 5) Exploiting temporal factor for multimedia data storage and mining; and 6) Associative applications to social media mining tasks.
author2 Tan Ah Hwee
author_facet Tan Ah Hwee
Meng, Lei
format Theses and Dissertations
author Meng, Lei
author_sort Meng, Lei
title Clustering and heterogeneous information fusion for social media theme discovery and associative mining
title_short Clustering and heterogeneous information fusion for social media theme discovery and associative mining
title_full Clustering and heterogeneous information fusion for social media theme discovery and associative mining
title_fullStr Clustering and heterogeneous information fusion for social media theme discovery and associative mining
title_full_unstemmed Clustering and heterogeneous information fusion for social media theme discovery and associative mining
title_sort clustering and heterogeneous information fusion for social media theme discovery and associative mining
publishDate 2015
url https://hdl.handle.net/10356/62096
_version_ 1759857329801527296
spelling sg-ntu-dr.10356-620962023-03-04T00:42:23Z Clustering and heterogeneous information fusion for social media theme discovery and associative mining Meng, Lei Tan Ah Hwee School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence The emergence of social networking web sites has created numerous interactive sharing platforms for users to upload, comment, and share multimedia content online within their social circles. It has led to the massive number of web multimedia documents, together with their rich meta-information, such as category information, user tagging and description, and user comments. Such interconnected but heterogeneous social media data has provided opportunities for understanding traditional multimedia data, such as images and text documents. More importantly, the different types of activities and interactions of social users could be utilized to understand and analyze user behaviors, and discover social trends in social networks. Clustering is an important approach to the analysis and mining of social media data. However, different from traditional multimedia data, the social media data are typically massive, diverse, heterogeneous and noisy. Those characteristics of social media data raise new challenges for existing clustering techniques, including the scalability to big data, the ability to automatically recognize the number of clusters in data sets, the strategies to effectively integrate data from heterogeneous resources for clustering, and the robustness to noisy features. Moreover, considering that different social users may have different preferences for categorizing the social media data, incorporating user preferences into the clustering framework to produce personalized data clusters is also a challenge. In order to address the above issues, in this thesis, we investigate and develop novel clustering algorithms for the fast and robust clustering of large-scale social media data by integrating their multiple but different types of features and user preferences, and explore their applications to the associative social media mining tasks. Towards this goal, we have completed four key tasks. First, we developed a two-step semi-supervised hierarchical clustering algorithm, termed Personalized Hierarchical Theme-based Clustering (PHTC), for personalized web image organization by exploiting the surrounding text of web images. Our experiments have shown that PHTC can identify high quality clusters of web images under user supervision using the proposed semi-supervised clustering algorithm, called Probabilistic Fusion Adaptive Resonance Theory (PF-ART). In addition, it can order the clusters into a systematical hierarchy with a higher quality and lower time cost than several existing hierarchical clustering algorithms. Secondly, we proposed a semi-supervised heterogeneous data co-clustering algorithm, termed Generalized Heterogeneous Fusion Adaptive Resonance Theory (GHF-ART), for multimedia data co-clustering by integrating different types of features from inter-related but heterogeneous data resources and user preferences. Compared with existing approaches, GHF-ART has the advantages of strong noise immunity, adaptive feature weighting, low computational cost, and incremental clustering in handling the dynamic social media data. Thirdly, we investigated the feasibility of GHF-ART to clustering social network data for discovering user communities in heterogeneous social networks, and demonstrated its capability for analyzing the correlation among different social links and mining the potential themes of user communities. Lastly, we studied the geometrical dynamics of Fuzzy ART and proposed three methods to adapt the vigilance parameter of Fuzzy ART. This leads to clustering algorithms insensitive to the input parameters for dealing with large and complex social media data. Our experiments have demonstrated the effectiveness of the proposed methods. Furthermore, the geometrical study of Fuzzy ART may also benefit further research. While our completed studies has provided the base technologies for social media mining, the future directions for this thesis may focus on the following aspects: 1) Modeling of short and noisy text; 2) Automated selection of vigilance parameter in Fuzzy ART; 3) Improvement of clustering mechanism of ART; 4) Extension work on multimedia data indexing, annotation and retrieval; 5) Exploiting temporal factor for multimedia data storage and mining; and 6) Associative applications to social media mining tasks. DOCTOR OF PHILOSOPHY (SCE) 2015-01-13T09:16:34Z 2015-01-13T09:16:34Z 2014 2014 Thesis Meng, L. (2014). Clustering and heterogeneous information fusion for social media theme discovery and associative mining. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/62096 10.32657/10356/62096 en 170 p. application/pdf