From node embedding to community embedding on graphs

In the last few years, graphs have become an instinctive representative tool to better study complex structures. For example, in chemistry, it is common to represent and study the interaction between different proteins as a graph, reducing the experimental costs. Similarly, with the consolidation of...

Full description

Saved in:
Bibliographic Details
Main Author: Cavallari, Sandro
Other Authors: Erik Cambria
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/85349
http://hdl.handle.net/10220/48265
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-85349
record_format dspace
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Cavallari, Sandro
From node embedding to community embedding on graphs
description In the last few years, graphs have become an instinctive representative tool to better study complex structures. For example, in chemistry, it is common to represent and study the interaction between different proteins as a graph, reducing the experimental costs. Similarly, with the consolidation of Web 2.0 and the rise of Web 3.0, social networks have become one of the most popular sources of information, wherein users not only utilize but also provide content. On such platforms, the users’ information is explicitly provided and most of the information is implicitly contained in the links between different users. In robotics, it is possible to achieve more a “human-like” movement of an artificial skeleton if we represent the interaction of each major joint rather than adopting an independent approach. In addition to its representation capabilities, effectively extracting valuable information from graph data remains an open problem. While, in the last few years, machine learning, particularly deep learning, has achieved outstanding results in many complex applications such as image processing or natural language processing, researchers have not yet found a legitimate technique to leverage the structural information contained in the graph data type. Moreover, it is not yet possible to fully unleash the learning capabilities of deep models. Thus, in this thesis, we will introduce the concept of graph embedding, a robust set of techniques able to automatically extract meaningful information from network structures. First, we will provide an understanding of the problem setting by formally defining each component. Subsequently, we provide a review of all the major studies related to this topic. Then, we study an important yet largely under-explored setting of graph embedding, i.e., embedding communities rather than individual nodes. We identify that community embedding is not only useful for community-level applications such as graph visualization but also provides an exciting opportunity to improve community detection and node classification. To learn such embedding, our insight hinges on a closed loop among community embedding, community detection, and node embedding. On the one hand, node embedding can help improve community detection, which outputs good communities for fitting better community embedding. On the other hand, community embedding can be used to optimize the node embedding by introducing a community-aware high-order proximity. Based on this insights, we propose a novel Community Embedding (ComE) model that jointly solves the three tasks together. In practice, the number of communities can be unknown beforehand; therefore, we further proposed ComE+, a new framework that handles both uncertainty of unknown ground truth community assignments and an unknown number of communities. We further developed an efficient inference algorithm for ComE+ by maintaining the complexity as low as O(|V| + |E|). We evaluated the proposed algorithms on multiple real-world data sets and showed that they improve graph visualization and outperform the state-of-the-art baselines in various application tasks, e.g., community detection and node classification. Finally, given the proliferation of dynamic interactive structure in many socio-economic problems, we argue that a new set of embedding techniques are required. Recently, spatio-temporal graphs and interaction networks have proven to be useful in abstracting the underling interactive structure that defines the influence between the various aspects of a problem, but most of the previous studies have developed architecture that is able to only preserve manually crafted interrelation. However, we argue that it is necessary to automatically learn from such interaction structures if we want such models to fit in real-world problems. We also require that the discovered latent structure should be easily interpretable to facilitate the understanding of the modern world. Thus, we propose an efficient neural mixture model able to project a dynamic interaction in a fixed size space, preserving the influence that each node undergoes because of its neighbors. The proposed architecture is able to outperform strong baselines in various data sets ranging from the prediction of the household’s energy load to traffic congestion, and many others.
author2 Erik Cambria
author_facet Erik Cambria
Cavallari, Sandro
format Theses and Dissertations
author Cavallari, Sandro
author_sort Cavallari, Sandro
title From node embedding to community embedding on graphs
title_short From node embedding to community embedding on graphs
title_full From node embedding to community embedding on graphs
title_fullStr From node embedding to community embedding on graphs
title_full_unstemmed From node embedding to community embedding on graphs
title_sort from node embedding to community embedding on graphs
publishDate 2019
url https://hdl.handle.net/10356/85349
http://hdl.handle.net/10220/48265
_version_ 1681059257823264768
spelling sg-ntu-dr.10356-853492020-07-01T05:44:21Z From node embedding to community embedding on graphs Cavallari, Sandro Erik Cambria School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering In the last few years, graphs have become an instinctive representative tool to better study complex structures. For example, in chemistry, it is common to represent and study the interaction between different proteins as a graph, reducing the experimental costs. Similarly, with the consolidation of Web 2.0 and the rise of Web 3.0, social networks have become one of the most popular sources of information, wherein users not only utilize but also provide content. On such platforms, the users’ information is explicitly provided and most of the information is implicitly contained in the links between different users. In robotics, it is possible to achieve more a “human-like” movement of an artificial skeleton if we represent the interaction of each major joint rather than adopting an independent approach. In addition to its representation capabilities, effectively extracting valuable information from graph data remains an open problem. While, in the last few years, machine learning, particularly deep learning, has achieved outstanding results in many complex applications such as image processing or natural language processing, researchers have not yet found a legitimate technique to leverage the structural information contained in the graph data type. Moreover, it is not yet possible to fully unleash the learning capabilities of deep models. Thus, in this thesis, we will introduce the concept of graph embedding, a robust set of techniques able to automatically extract meaningful information from network structures. First, we will provide an understanding of the problem setting by formally defining each component. Subsequently, we provide a review of all the major studies related to this topic. Then, we study an important yet largely under-explored setting of graph embedding, i.e., embedding communities rather than individual nodes. We identify that community embedding is not only useful for community-level applications such as graph visualization but also provides an exciting opportunity to improve community detection and node classification. To learn such embedding, our insight hinges on a closed loop among community embedding, community detection, and node embedding. On the one hand, node embedding can help improve community detection, which outputs good communities for fitting better community embedding. On the other hand, community embedding can be used to optimize the node embedding by introducing a community-aware high-order proximity. Based on this insights, we propose a novel Community Embedding (ComE) model that jointly solves the three tasks together. In practice, the number of communities can be unknown beforehand; therefore, we further proposed ComE+, a new framework that handles both uncertainty of unknown ground truth community assignments and an unknown number of communities. We further developed an efficient inference algorithm for ComE+ by maintaining the complexity as low as O(|V| + |E|). We evaluated the proposed algorithms on multiple real-world data sets and showed that they improve graph visualization and outperform the state-of-the-art baselines in various application tasks, e.g., community detection and node classification. Finally, given the proliferation of dynamic interactive structure in many socio-economic problems, we argue that a new set of embedding techniques are required. Recently, spatio-temporal graphs and interaction networks have proven to be useful in abstracting the underling interactive structure that defines the influence between the various aspects of a problem, but most of the previous studies have developed architecture that is able to only preserve manually crafted interrelation. However, we argue that it is necessary to automatically learn from such interaction structures if we want such models to fit in real-world problems. We also require that the discovered latent structure should be easily interpretable to facilitate the understanding of the modern world. Thus, we propose an efficient neural mixture model able to project a dynamic interaction in a fixed size space, preserving the influence that each node undergoes because of its neighbors. The proposed architecture is able to outperform strong baselines in various data sets ranging from the prediction of the household’s energy load to traffic congestion, and many others. Doctor of Philosophy 2019-05-17T07:27:27Z 2019-12-06T16:02:10Z 2019-05-17T07:27:27Z 2019-12-06T16:02:10Z 2019 Thesis Cavallari, S. (2019). From node embedding to community embedding on graphs. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/85349 http://hdl.handle.net/10220/48265 10.32657/10220/48265 en 118 p. application/pdf