Cluster analysis on dynamic graphs

As the collection of data becomes common with the big data revolution, an increasing number of methods are created for data analysis. One important field of data is graph data. Much research has been done for static graph analysis in recent years. However, graph data today is continuously evolving (...

Full description

Saved in:
Bibliographic Details
Main Author: Lee, Cecilia Si Si
Other Authors: Ke Yiping, Kelly
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156636
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:As the collection of data becomes common with the big data revolution, an increasing number of methods are created for data analysis. One important field of data is graph data. Much research has been done for static graph analysis in recent years. However, graph data today is continuously evolving (not static), especially with the constant expansion of the Internet. For example, social networks are changing every second as users can add or remove social connections. Clustering is often used as a technique to identify the different social clusters. Traditionally, graphs are represented as adjacency lists and adjacency matrices. However, as data size increases, it becomes impractical to store graphs in adjacency lists and matrices. Thus, graphs are represented as embeddings. Embedding is a compressed representation of the graph, and it is a vector-based representation of the graph. Thus, this representation can be used as input for machine learning. There are a few research that proposes dynamic graph embedding methods. However, little work has been done to use these dynamic graph embedding methods to get embeddings and then perform clustering on them. Thus, in this project, we will specifically experiment with the different embedding methods and perform clustering on the embeddings. This project aims to study the clustering problem on dynamic graphs that evolve and change over time. It will find a method that does not require re-clustering of the graph when it is modified as well as maintain the accuracy of the clustering of the dynamic graphs. We make the following specific contributions in this project: 1. In-depth literature review of recent state-of-art dynamic graph clustering methods. 2. Propose a machine learning graph pipeline that merges different dynamic graph embedding methods with k-means clustering, to improve current dynamic graph clustering methods. 3. Perform in-depth evaluation and analysis of the different embedding models by carrying out experiments with simulated and real-life datasets; with DyEARNN being the best embedding model for dynamic clustering.