Semantic Visualization for Spherical Representation

Visualization of high-dimensional data such as text documents is widely applicable. The traditional means is to find an appropriate embedding of the high-dimensional representation in a low-dimensional visualizable space. As topic modeling is a useful form of dimensionality reduction that preserves...

Full description

Saved in:
Bibliographic Details
Main Authors: LE, Tuan M. V., LAUW, Hady W.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2014
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2250
https://ink.library.smu.edu.sg/context/sis_research/article/3250/viewcontent/kdd14.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Visualization of high-dimensional data such as text documents is widely applicable. The traditional means is to find an appropriate embedding of the high-dimensional representation in a low-dimensional visualizable space. As topic modeling is a useful form of dimensionality reduction that preserves the semantics in documents, recent approaches aim for a visualization that is consistent with both the original word space, as well as the semantic topic space. In this paper, we address the semantic visualization problem. Given a corpus of documents, the objective is to simultaneously learn the topic distributions as well as the visualization coordinates of documents. We propose to develop a semantic visualization model that approximates L2-normalized data directly. The key is to associate each document with three representations: a coordinate in the visualization space, a multinomial distribution in the topic space, and a directional vector in a high-dimensional unit hypersphere in the word space. We join these representations in a unified generative model, and describe its parameter estimation through variational inference. Comprehensive experiments on real-life text datasets show that the proposed method outperforms the existing baselines on objective evaluation metrics for visualization quality and topic interpretability.