Advancement in graph data mining: applications in unsupervised, continual, and few-shot learning

Graph mining has proven to be extremely useful in analysing features and properties of real-world graphs. This enables a number of tasks including the prediction and evaluation of how information varies with changes in the link structure, generating and building models to extract properties such as...

Full description

Saved in:
Bibliographic Details
Main Author: Rakaraddi, Appan
Other Authors: Lam Siew Kei
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/176227
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-176227
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Graph neural networks
Continual learning
Few-shot learning
spellingShingle Computer and Information Science
Graph neural networks
Continual learning
Few-shot learning
Rakaraddi, Appan
Advancement in graph data mining: applications in unsupervised, continual, and few-shot learning
description Graph mining has proven to be extremely useful in analysing features and properties of real-world graphs. This enables a number of tasks including the prediction and evaluation of how information varies with changes in the link structure, generating and building models to extract properties such as link prediction, node classification and recommendation, cluster and community detection, etc. Deep learning techniques have become the prevalent approach for graph mining as they are able to take advantage of the increasing availability of graph data. Traditional deep learning models pre-process the graph data by mapping the nodes to a real vector, which can desecrate important node relationships in the graphs. Training on such pre-processed structures with the traditional deep learning models may present extremely unstable and inaccurate results. To overcome these issues, Graph Neural Network (GNN) was introduced to handle the non-Euclidean structure of the graphs for many of the classifier and regressor tasks. However, state-of-the-art GNNs require a large amount of training data, which incur an enormous burden of labelled data annotations for supervised learning. Also, GNNs vastly suffer from degradation with the increasing number of depth layers leading to oversmoothing during neighbourhood feature aggregation. The aim of this research is to overcome the above-mentioned challenges faced by GNNs for graph mining. We propose a method for graph data mining as a regression problem for the estimation of Eigenvector Centrality in graphs with a GNN based approach in a completely unsupervised learning environment. To achieve this, we define an Encoder-Decoder based model architecture called CUL. We show that even when trained on a small number of datasets, the model performed at least on par with the supervised methods in terms of accuracy with different types of embedding schemes like Graph Convolutional Network (GCN), GraphSAGE, and Graph Attention Network (GAT). This is reinforced by the model performance based on top-$\mathcal{N}\%$ accuracy metric, on real-world and synthetically generated datasets. CUL demonstrates at least on par or even a superior performance over existing methods. Next, we focus on the applications of GNNs in the domain of continual learning. We propose a model called GCL that learns across a sequence of tasks for node-classification in graphs under task-incremental and class-incremental settings. GCL comprises of a Reinforced Controller/RLC and an expandable GNN framework called Child Network/CN, along with a memory buffer to store the node samples. We compared our method against state-of-the-art methods in regards to average accuracy and average forgetting on four datasets with different GNNs. Our model performed better than the other methods i.e., it showed higher accuracy values as well as lower forgetting rates across the tasks in both task-incremental and class-incremental settings. Finally, we propose a method for improving few-shot node classification on graphs on any generic GNN backbone framework. We propose an uncertainty-based estimator which is modeled using a GNN that maps the scalar discrete probabilities of a GNN-classifier outputs to a continuous probability distribution. We demonstrate that the few-shot node classification accuracy improves by testing on different graph datasets. We also focus on bridging gaps between different few-shot learning methods for node classification for graphs. The performance variance between the methods is analysed and the pros/cons of each of the architectures are highlighted. This analysis aids in understanding the performance differentiator and development of better architectures for few-shot learning on graph node classification.
author2 Lam Siew Kei
author_facet Lam Siew Kei
Rakaraddi, Appan
format Thesis-Doctor of Philosophy
author Rakaraddi, Appan
author_sort Rakaraddi, Appan
title Advancement in graph data mining: applications in unsupervised, continual, and few-shot learning
title_short Advancement in graph data mining: applications in unsupervised, continual, and few-shot learning
title_full Advancement in graph data mining: applications in unsupervised, continual, and few-shot learning
title_fullStr Advancement in graph data mining: applications in unsupervised, continual, and few-shot learning
title_full_unstemmed Advancement in graph data mining: applications in unsupervised, continual, and few-shot learning
title_sort advancement in graph data mining: applications in unsupervised, continual, and few-shot learning
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/176227
_version_ 1806059862309732352
spelling sg-ntu-dr.10356-1762272024-06-03T06:51:19Z Advancement in graph data mining: applications in unsupervised, continual, and few-shot learning Rakaraddi, Appan Lam Siew Kei School of Computer Science and Engineering Mahardhika Pratama ASSKLam@ntu.edu.sg Computer and Information Science Graph neural networks Continual learning Few-shot learning Graph mining has proven to be extremely useful in analysing features and properties of real-world graphs. This enables a number of tasks including the prediction and evaluation of how information varies with changes in the link structure, generating and building models to extract properties such as link prediction, node classification and recommendation, cluster and community detection, etc. Deep learning techniques have become the prevalent approach for graph mining as they are able to take advantage of the increasing availability of graph data. Traditional deep learning models pre-process the graph data by mapping the nodes to a real vector, which can desecrate important node relationships in the graphs. Training on such pre-processed structures with the traditional deep learning models may present extremely unstable and inaccurate results. To overcome these issues, Graph Neural Network (GNN) was introduced to handle the non-Euclidean structure of the graphs for many of the classifier and regressor tasks. However, state-of-the-art GNNs require a large amount of training data, which incur an enormous burden of labelled data annotations for supervised learning. Also, GNNs vastly suffer from degradation with the increasing number of depth layers leading to oversmoothing during neighbourhood feature aggregation. The aim of this research is to overcome the above-mentioned challenges faced by GNNs for graph mining. We propose a method for graph data mining as a regression problem for the estimation of Eigenvector Centrality in graphs with a GNN based approach in a completely unsupervised learning environment. To achieve this, we define an Encoder-Decoder based model architecture called CUL. We show that even when trained on a small number of datasets, the model performed at least on par with the supervised methods in terms of accuracy with different types of embedding schemes like Graph Convolutional Network (GCN), GraphSAGE, and Graph Attention Network (GAT). This is reinforced by the model performance based on top-$\mathcal{N}\%$ accuracy metric, on real-world and synthetically generated datasets. CUL demonstrates at least on par or even a superior performance over existing methods. Next, we focus on the applications of GNNs in the domain of continual learning. We propose a model called GCL that learns across a sequence of tasks for node-classification in graphs under task-incremental and class-incremental settings. GCL comprises of a Reinforced Controller/RLC and an expandable GNN framework called Child Network/CN, along with a memory buffer to store the node samples. We compared our method against state-of-the-art methods in regards to average accuracy and average forgetting on four datasets with different GNNs. Our model performed better than the other methods i.e., it showed higher accuracy values as well as lower forgetting rates across the tasks in both task-incremental and class-incremental settings. Finally, we propose a method for improving few-shot node classification on graphs on any generic GNN backbone framework. We propose an uncertainty-based estimator which is modeled using a GNN that maps the scalar discrete probabilities of a GNN-classifier outputs to a continuous probability distribution. We demonstrate that the few-shot node classification accuracy improves by testing on different graph datasets. We also focus on bridging gaps between different few-shot learning methods for node classification for graphs. The performance variance between the methods is analysed and the pros/cons of each of the architectures are highlighted. This analysis aids in understanding the performance differentiator and development of better architectures for few-shot learning on graph node classification. Doctor of Philosophy 2024-05-14T05:29:05Z 2024-05-14T05:29:05Z 2024 Thesis-Doctor of Philosophy Rakaraddi, A. (2024). Advancement in graph data mining: applications in unsupervised, continual, and few-shot learning. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/176227 https://hdl.handle.net/10356/176227 10.32657/10356/176227 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University