Semantic visualization for short texts with word embeddings

Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increas...

Full description

Saved in:
Bibliographic Details
Main Authors: LE, Van Minh Tuan, LAUW, Hady W.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3766
https://ink.library.smu.edu.sg/context/sis_research/article/4768/viewcontent/ijcai17a.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4768
record_format dspace
spelling sg-smu-ink.sis_research-47682018-03-07T05:27:42Z Semantic visualization for short texts with word embeddings LE, Van Minh Tuan LAUW, Hady W. Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a novel semantic visualization model that seamlessly integrates visualization coordinates, topic distributions, and word vectors. We propose a model called GaussianSV, which outperforms pipelined baselines that derive topic models and visualization coordinates as disjoint steps, as well as semantic visualization baselines that do not consider word embeddings. 2017-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3766 info:doi/10.24963/ijcai.2017/288 https://ink.library.smu.edu.sg/context/sis_research/article/4768/viewcontent/ijcai17a.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Machine Learning Data Mining Feature Selection/Construction Learning Graphical Models Databases and Information Systems Data Storage Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Machine Learning
Data Mining
Feature Selection/Construction
Learning Graphical Models
Databases and Information Systems
Data Storage Systems
spellingShingle Machine Learning
Data Mining
Feature Selection/Construction
Learning Graphical Models
Databases and Information Systems
Data Storage Systems
LE, Van Minh Tuan
LAUW, Hady W.
Semantic visualization for short texts with word embeddings
description Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a novel semantic visualization model that seamlessly integrates visualization coordinates, topic distributions, and word vectors. We propose a model called GaussianSV, which outperforms pipelined baselines that derive topic models and visualization coordinates as disjoint steps, as well as semantic visualization baselines that do not consider word embeddings.
format text
author LE, Van Minh Tuan
LAUW, Hady W.
author_facet LE, Van Minh Tuan
LAUW, Hady W.
author_sort LE, Van Minh Tuan
title Semantic visualization for short texts with word embeddings
title_short Semantic visualization for short texts with word embeddings
title_full Semantic visualization for short texts with word embeddings
title_fullStr Semantic visualization for short texts with word embeddings
title_full_unstemmed Semantic visualization for short texts with word embeddings
title_sort semantic visualization for short texts with word embeddings
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/3766
https://ink.library.smu.edu.sg/context/sis_research/article/4768/viewcontent/ijcai17a.pdf
_version_ 1770573727538348032