Topic modeling on document networks with Dirichlet Optimal Transport Barycenter

Text documents are often interconnected in a network structure, e.g., academic papers via citations, Web pages via hyperlinks. On the one hand, though Graph Neural Networks (GNNs) have shown promising ability to derive effective embeddings for such networked documents, they do not assume a latent to...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG, Ce, LAUW, Hady Wirawan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9839
https://ink.library.smu.edu.sg/context/sis_research/article/10839/viewcontent/tkde23barycenter.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10839
record_format dspace
spelling sg-smu-ink.sis_research-108392024-12-24T03:28:35Z Topic modeling on document networks with Dirichlet Optimal Transport Barycenter ZHANG, Ce LAUW, Hady Wirawan Text documents are often interconnected in a network structure, e.g., academic papers via citations, Web pages via hyperlinks. On the one hand, though Graph Neural Networks (GNNs) have shown promising ability to derive effective embeddings for such networked documents, they do not assume a latent topic structure and result in uninterpretable embeddings. On the other hand, topic models can infer semantically interpretable topic distributions for documents by associating each topic with a group of understandable key words. However, most topic models mainly focus on plain text within documents and fail to leverage network structure across documents. Network connectivity reveals topic similarity between linked documents, and modeling it could uncover meaningful semantics. Motivated by above two challenges, in this paper, we propose a GNN-based neural topic model that both captures network connectivity and derives semantically interpretable topic distributions for networked documents. For network modeling, we build the model based on the theory of Optimal Transport Barycenter, which captures network structure by allowing the topic distribution of a document to generate the content of its linked neighbors. For semantic interpretability, we extend optimal transport by incorporating semantically related words in the embedding space. Since Dirichlet prior in Latent Dirichlet Allocation successfully improves topic quality, we also analyze Dirichlet as an optimal transport prior distribution to improve topic interpretability. We design rejection sampling to simulate Dirichlet distribution. Extensive experiments on document classification, clustering, link prediction, and topic analysis verify the effectiveness of our model. 2024-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9839 info:doi/10.1109/TKDE.2023.3303465 https://ink.library.smu.edu.sg/context/sis_research/article/10839/viewcontent/tkde23barycenter.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Graph Neural Networks Text Mining Optimal Transport Dirichlet Distribution Document Networks Artificial Intelligence and Robotics Computer Sciences
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Graph Neural Networks
Text Mining
Optimal Transport
Dirichlet Distribution
Document Networks
Artificial Intelligence and Robotics
Computer Sciences
spellingShingle Graph Neural Networks
Text Mining
Optimal Transport
Dirichlet Distribution
Document Networks
Artificial Intelligence and Robotics
Computer Sciences
ZHANG, Ce
LAUW, Hady Wirawan
Topic modeling on document networks with Dirichlet Optimal Transport Barycenter
description Text documents are often interconnected in a network structure, e.g., academic papers via citations, Web pages via hyperlinks. On the one hand, though Graph Neural Networks (GNNs) have shown promising ability to derive effective embeddings for such networked documents, they do not assume a latent topic structure and result in uninterpretable embeddings. On the other hand, topic models can infer semantically interpretable topic distributions for documents by associating each topic with a group of understandable key words. However, most topic models mainly focus on plain text within documents and fail to leverage network structure across documents. Network connectivity reveals topic similarity between linked documents, and modeling it could uncover meaningful semantics. Motivated by above two challenges, in this paper, we propose a GNN-based neural topic model that both captures network connectivity and derives semantically interpretable topic distributions for networked documents. For network modeling, we build the model based on the theory of Optimal Transport Barycenter, which captures network structure by allowing the topic distribution of a document to generate the content of its linked neighbors. For semantic interpretability, we extend optimal transport by incorporating semantically related words in the embedding space. Since Dirichlet prior in Latent Dirichlet Allocation successfully improves topic quality, we also analyze Dirichlet as an optimal transport prior distribution to improve topic interpretability. We design rejection sampling to simulate Dirichlet distribution. Extensive experiments on document classification, clustering, link prediction, and topic analysis verify the effectiveness of our model.
format text
author ZHANG, Ce
LAUW, Hady Wirawan
author_facet ZHANG, Ce
LAUW, Hady Wirawan
author_sort ZHANG, Ce
title Topic modeling on document networks with Dirichlet Optimal Transport Barycenter
title_short Topic modeling on document networks with Dirichlet Optimal Transport Barycenter
title_full Topic modeling on document networks with Dirichlet Optimal Transport Barycenter
title_fullStr Topic modeling on document networks with Dirichlet Optimal Transport Barycenter
title_full_unstemmed Topic modeling on document networks with Dirichlet Optimal Transport Barycenter
title_sort topic modeling on document networks with dirichlet optimal transport barycenter
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9839
https://ink.library.smu.edu.sg/context/sis_research/article/10839/viewcontent/tkde23barycenter.pdf
_version_ 1820027796111491072