Text-attributed graph representation learning : Methods, applications, and challenges

Text documents are usually connected in a graph structure, resulting in an important class of data named text-attributed graph, e.g., paper citation graph and Web page hyperlink graph. On the one hand, Graph Neural Networks (GNNs) consider text in each document as general vertex attribute and do not...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG, Ce, YANG, Menglin, YING, Rex, LAUW, Hady Wirawan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9841
https://ink.library.smu.edu.sg/context/sis_research/article/10841/viewcontent/webconf24tut.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Text documents are usually connected in a graph structure, resulting in an important class of data named text-attributed graph, e.g., paper citation graph and Web page hyperlink graph. On the one hand, Graph Neural Networks (GNNs) consider text in each document as general vertex attribute and do not specifically deal with text data. On the other hand, Pre-trained Language Models (PLMs) and Topic Models (TMs) learn effective document embeddings. However, most models focus on text content in each single document only, ignoring link adjacency across documents. The above two challenges motivate the development of text-attributed graph representation learning, combining GNNs with PLMs and TMs into a unified model and learning document embeddings preserving both modalities, which fulfill applications, e.g., text classification, citation recommendation, question answering, etc. In this lecture-style tutorial, we will provide a systematic review of text-attributed graph, including its formal definition, recent methods, diverse applications, and challenges. Specifically, i) we will formally define text-attributed graph and briefly review GNNs, PLMs, and TMs, which are the fundamentals of some existing methods. ii) We will then revisit the technical details of text-attributed graph models, which are generally split into two categories, PLMbased and TM-based. iii) Besides, we will show diverse applications built on text-attributed graph. iv) Finally, we will discuss some challenges of existing models and propose solutions for future research.