Semantic Visualization with Neighborhood Graph Regularization

Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the vocabulary size. Classical approaches to document visu...

Full description

Saved in:
Bibliographic Details
Main Authors: LE, Tuan Minh Van, LAUW, Hady W.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2016
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3252
https://ink.library.smu.edu.sg/context/sis_research/article/4254/viewcontent/11001_Article_Text_20504_1_10_20180216.pdf
https://ink.library.smu.edu.sg/context/sis_research/article/4254/filename/0/type/additional/viewcontent/SEMAFORE_master.zip
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4254
record_format dspace
spelling sg-smu-ink.sis_research-42542019-07-23T00:59:49Z Semantic Visualization with Neighborhood Graph Regularization LE, Tuan Minh Van LAUW, Hady W. Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the vocabulary size. Classical approaches to document visualization directly reduce this into visualizable two or three dimensions. Recent approaches consider an intermediate representation in topic space, between word space and visualization space, which preserves the semantics by topic modeling. While aiming for a good fit between the model parameters and the observed data, previous approaches have not considered the local consistency among data instances. We consider the problem of semantic visualization by jointly modeling topics and visualization on the intrinsic document manifold, modeled using a neighborhood graph. Each document has both a topic distribution and visualization coordinate. Specifically, we propose an unsupervised probabilistic model, called SEMAFORE, which aims to preserve the manifold in the lower-dimensional spaces through a neighborhood regularization framework designed for the semantic visualization task. To validate the efficacy of SEMAFORE, our comprehensive experiments on a number of real-life text datasets of news articles and Web pages show that the proposed methods outperform the state-of-the-art baselines on objective evaluation metrics. 2016-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3252 info:doi/10.1613/jair.4983 https://ink.library.smu.edu.sg/context/sis_research/article/4254/viewcontent/11001_Article_Text_20504_1_10_20180216.pdf https://ink.library.smu.edu.sg/context/sis_research/article/4254/filename/0/type/additional/viewcontent/SEMAFORE_master.zip http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Databases and Information Systems
Numerical Analysis and Scientific Computing
LE, Tuan Minh Van
LAUW, Hady W.
Semantic Visualization with Neighborhood Graph Regularization
description Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the vocabulary size. Classical approaches to document visualization directly reduce this into visualizable two or three dimensions. Recent approaches consider an intermediate representation in topic space, between word space and visualization space, which preserves the semantics by topic modeling. While aiming for a good fit between the model parameters and the observed data, previous approaches have not considered the local consistency among data instances. We consider the problem of semantic visualization by jointly modeling topics and visualization on the intrinsic document manifold, modeled using a neighborhood graph. Each document has both a topic distribution and visualization coordinate. Specifically, we propose an unsupervised probabilistic model, called SEMAFORE, which aims to preserve the manifold in the lower-dimensional spaces through a neighborhood regularization framework designed for the semantic visualization task. To validate the efficacy of SEMAFORE, our comprehensive experiments on a number of real-life text datasets of news articles and Web pages show that the proposed methods outperform the state-of-the-art baselines on objective evaluation metrics.
format text
author LE, Tuan Minh Van
LAUW, Hady W.
author_facet LE, Tuan Minh Van
LAUW, Hady W.
author_sort LE, Tuan Minh Van
title Semantic Visualization with Neighborhood Graph Regularization
title_short Semantic Visualization with Neighborhood Graph Regularization
title_full Semantic Visualization with Neighborhood Graph Regularization
title_fullStr Semantic Visualization with Neighborhood Graph Regularization
title_full_unstemmed Semantic Visualization with Neighborhood Graph Regularization
title_sort semantic visualization with neighborhood graph regularization
publisher Institutional Knowledge at Singapore Management University
publishDate 2016
url https://ink.library.smu.edu.sg/sis_research/3252
https://ink.library.smu.edu.sg/context/sis_research/article/4254/viewcontent/11001_Article_Text_20504_1_10_20180216.pdf
https://ink.library.smu.edu.sg/context/sis_research/article/4254/filename/0/type/additional/viewcontent/SEMAFORE_master.zip
_version_ 1770573042118819840