Pre-training on large-scale heterogeneous graph
Graph neural networks (GNNs) emerge as the state-of-the-art representation learning methods on graphs and often rely on a large amount of labeled data to achieve satisfactory performance. Recently, in order to relieve the label scarcity issues, some works propose to pre-train GNNs in a self-supervis...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2021
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6888 https://ink.library.smu.edu.sg/context/sis_research/article/7891/viewcontent/KDD21_PT_HGNN.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-7891 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-78912022-02-07T11:01:47Z Pre-training on large-scale heterogeneous graph JIANG, Xunqiang JIA, Tianrui FANG, Yuan SHI, Chuan LIN, Zhe WANG, Hui Graph neural networks (GNNs) emerge as the state-of-the-art representation learning methods on graphs and often rely on a large amount of labeled data to achieve satisfactory performance. Recently, in order to relieve the label scarcity issues, some works propose to pre-train GNNs in a self-supervised manner by distilling transferable knowledge from the unlabeled graph structures. Unfortunately, these pre-training frameworks mainly target at homogeneous graphs, while real interaction systems usually constitute large-scale heterogeneous graphs, containing different types of nodes and edges, which leads to new challenges on structure heterogeneity and scalability for graph pre-training. In this paper, we first study the problem of pre-training on large-scale heterogeneous graph and propose a novel pre-training GNN framework, named PT-HGNN. The proposed PT-HGNN designs both the node- and schema-level pre-training tasks to contrastively preserve heterogeneous semantic and structural properties as a form of transferable knowledge for various downstream tasks. In addition, a relationbased personalized PageRank is proposed to sparsify large-scale heterogeneous graph for efficient pre-training. Extensive experiments on one of the largest public heterogeneous graphs (OAG) demonstrate that our PT-HGNN significantly outperforms various state-of-the-art baselines. 2021-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6888 info:doi/10.1145/3447548.3467396 https://ink.library.smu.edu.sg/context/sis_research/article/7891/viewcontent/KDD21_PT_HGNN.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Heterogeneous graph Self-supervised learning Pre-training Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Heterogeneous graph Self-supervised learning Pre-training Databases and Information Systems |
spellingShingle |
Heterogeneous graph Self-supervised learning Pre-training Databases and Information Systems JIANG, Xunqiang JIA, Tianrui FANG, Yuan SHI, Chuan LIN, Zhe WANG, Hui Pre-training on large-scale heterogeneous graph |
description |
Graph neural networks (GNNs) emerge as the state-of-the-art representation learning methods on graphs and often rely on a large amount of labeled data to achieve satisfactory performance. Recently, in order to relieve the label scarcity issues, some works propose to pre-train GNNs in a self-supervised manner by distilling transferable knowledge from the unlabeled graph structures. Unfortunately, these pre-training frameworks mainly target at homogeneous graphs, while real interaction systems usually constitute large-scale heterogeneous graphs, containing different types of nodes and edges, which leads to new challenges on structure heterogeneity and scalability for graph pre-training. In this paper, we first study the problem of pre-training on large-scale heterogeneous graph and propose a novel pre-training GNN framework, named PT-HGNN. The proposed PT-HGNN designs both the node- and schema-level pre-training tasks to contrastively preserve heterogeneous semantic and structural properties as a form of transferable knowledge for various downstream tasks. In addition, a relationbased personalized PageRank is proposed to sparsify large-scale heterogeneous graph for efficient pre-training. Extensive experiments on one of the largest public heterogeneous graphs (OAG) demonstrate that our PT-HGNN significantly outperforms various state-of-the-art baselines. |
format |
text |
author |
JIANG, Xunqiang JIA, Tianrui FANG, Yuan SHI, Chuan LIN, Zhe WANG, Hui |
author_facet |
JIANG, Xunqiang JIA, Tianrui FANG, Yuan SHI, Chuan LIN, Zhe WANG, Hui |
author_sort |
JIANG, Xunqiang |
title |
Pre-training on large-scale heterogeneous graph |
title_short |
Pre-training on large-scale heterogeneous graph |
title_full |
Pre-training on large-scale heterogeneous graph |
title_fullStr |
Pre-training on large-scale heterogeneous graph |
title_full_unstemmed |
Pre-training on large-scale heterogeneous graph |
title_sort |
pre-training on large-scale heterogeneous graph |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2021 |
url |
https://ink.library.smu.edu.sg/sis_research/6888 https://ink.library.smu.edu.sg/context/sis_research/article/7891/viewcontent/KDD21_PT_HGNN.pdf |
_version_ |
1770576113934794752 |