Semantic visualization for short texts with word embeddings
Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increas...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2017
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/3766 https://ink.library.smu.edu.sg/context/sis_research/article/4768/viewcontent/ijcai17a.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-4768 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-47682018-03-07T05:27:42Z Semantic visualization for short texts with word embeddings LE, Van Minh Tuan LAUW, Hady W. Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a novel semantic visualization model that seamlessly integrates visualization coordinates, topic distributions, and word vectors. We propose a model called GaussianSV, which outperforms pipelined baselines that derive topic models and visualization coordinates as disjoint steps, as well as semantic visualization baselines that do not consider word embeddings. 2017-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3766 info:doi/10.24963/ijcai.2017/288 https://ink.library.smu.edu.sg/context/sis_research/article/4768/viewcontent/ijcai17a.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Machine Learning Data Mining Feature Selection/Construction Learning Graphical Models Databases and Information Systems Data Storage Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Machine Learning Data Mining Feature Selection/Construction Learning Graphical Models Databases and Information Systems Data Storage Systems |
spellingShingle |
Machine Learning Data Mining Feature Selection/Construction Learning Graphical Models Databases and Information Systems Data Storage Systems LE, Van Minh Tuan LAUW, Hady W. Semantic visualization for short texts with word embeddings |
description |
Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a novel semantic visualization model that seamlessly integrates visualization coordinates, topic distributions, and word vectors. We propose a model called GaussianSV, which outperforms pipelined baselines that derive topic models and visualization coordinates as disjoint steps, as well as semantic visualization baselines that do not consider word embeddings. |
format |
text |
author |
LE, Van Minh Tuan LAUW, Hady W. |
author_facet |
LE, Van Minh Tuan LAUW, Hady W. |
author_sort |
LE, Van Minh Tuan |
title |
Semantic visualization for short texts with word embeddings |
title_short |
Semantic visualization for short texts with word embeddings |
title_full |
Semantic visualization for short texts with word embeddings |
title_fullStr |
Semantic visualization for short texts with word embeddings |
title_full_unstemmed |
Semantic visualization for short texts with word embeddings |
title_sort |
semantic visualization for short texts with word embeddings |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2017 |
url |
https://ink.library.smu.edu.sg/sis_research/3766 https://ink.library.smu.edu.sg/context/sis_research/article/4768/viewcontent/ijcai17a.pdf |
_version_ |
1770573727538348032 |