Probabilistic Latent Document Network Embedding

A document network refers to a data type that can be represented as a graph of vertices, where each vertex is associated with a text document. Examples of such a data type include hyperlinked Web pages, academic publications with citations, and user profiles in social networks. Such data have very h...

Full description

Saved in:

Bibliographic Details
Main Authors:	LE, Tuan M. V., LAUW, Hady W.
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2014
Subjects:	dimensionality reduction document network embedding visualization topic modeling generative model Computer Sciences Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/2594 https://ink.library.smu.edu.sg/context/sis_research/article/3594/viewcontent/icdm14.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-3594
record_format	dspace
spelling	sg-smu-ink.sis_research-35942017-12-26T09:21:49Z Probabilistic Latent Document Network Embedding LE, Tuan M. V. LAUW, Hady W. A document network refers to a data type that can be represented as a graph of vertices, where each vertex is associated with a text document. Examples of such a data type include hyperlinked Web pages, academic publications with citations, and user profiles in social networks. Such data have very high-dimensional representations, in terms of text as well as network connectivity. In this paper, we study the problem of embedding, or finding a low-dimensional representation of a document network that "preserves" the data as much as possible. These embedded representations are useful for various applications driven by dimensionality reduction, such as visualization or feature selection. While previous works in embedding have mostly focused on either the textual aspect or the network aspect, we advocate a holistic approach by finding a unified low-rank representation for both aspects. Moreover, to lend semantic interpretability to the low-rank representation, we further propose to integrate topic modeling and embedding within a joint model. The gist is to join the various representations of a document (words, links, topics, and coordinates) within a generative model, and to estimate the hidden representations through MAP estimation. We validate our model on real-life document networks, showing that it outperforms comparable baselines comprehensively on objective evaluation metrics. 2014-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2594 info:doi/10.1109/ICDM.2014.119 https://ink.library.smu.edu.sg/context/sis_research/article/3594/viewcontent/icdm14.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University dimensionality reduction document network embedding visualization topic modeling generative model Computer Sciences Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	dimensionality reduction document network embedding visualization topic modeling generative model Computer Sciences Databases and Information Systems
spellingShingle	dimensionality reduction document network embedding visualization topic modeling generative model Computer Sciences Databases and Information Systems LE, Tuan M. V. LAUW, Hady W. Probabilistic Latent Document Network Embedding
description	A document network refers to a data type that can be represented as a graph of vertices, where each vertex is associated with a text document. Examples of such a data type include hyperlinked Web pages, academic publications with citations, and user profiles in social networks. Such data have very high-dimensional representations, in terms of text as well as network connectivity. In this paper, we study the problem of embedding, or finding a low-dimensional representation of a document network that "preserves" the data as much as possible. These embedded representations are useful for various applications driven by dimensionality reduction, such as visualization or feature selection. While previous works in embedding have mostly focused on either the textual aspect or the network aspect, we advocate a holistic approach by finding a unified low-rank representation for both aspects. Moreover, to lend semantic interpretability to the low-rank representation, we further propose to integrate topic modeling and embedding within a joint model. The gist is to join the various representations of a document (words, links, topics, and coordinates) within a generative model, and to estimate the hidden representations through MAP estimation. We validate our model on real-life document networks, showing that it outperforms comparable baselines comprehensively on objective evaluation metrics.
format	text
author	LE, Tuan M. V. LAUW, Hady W.
author_facet	LE, Tuan M. V. LAUW, Hady W.
author_sort	LE, Tuan M. V.
title	Probabilistic Latent Document Network Embedding
title_short	Probabilistic Latent Document Network Embedding
title_full	Probabilistic Latent Document Network Embedding
title_fullStr	Probabilistic Latent Document Network Embedding
title_full_unstemmed	Probabilistic Latent Document Network Embedding
title_sort	probabilistic latent document network embedding
publisher	Institutional Knowledge at Singapore Management University
publishDate	2014
url	https://ink.library.smu.edu.sg/sis_research/2594 https://ink.library.smu.edu.sg/context/sis_research/article/3594/viewcontent/icdm14.pdf
_version_	1770572520802484224

Probabilistic Latent Document Network Embedding

Similar Items