Semantic visualization for short texts with word embeddings

Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increas...

Full description

Saved in:

Bibliographic Details
Main Authors:	LE, Van Minh Tuan, LAUW, Hady W.
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2017
Subjects:	Machine Learning Data Mining Feature Selection/Construction Learning Graphical Models Databases and Information Systems Data Storage Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/3766 https://ink.library.smu.edu.sg/context/sis_research/article/4768/viewcontent/ijcai17a.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-4768
record_format	dspace
spelling	sg-smu-ink.sis_research-47682018-03-07T05:27:42Z Semantic visualization for short texts with word embeddings LE, Van Minh Tuan LAUW, Hady W. Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a novel semantic visualization model that seamlessly integrates visualization coordinates, topic distributions, and word vectors. We propose a model called GaussianSV, which outperforms pipelined baselines that derive topic models and visualization coordinates as disjoint steps, as well as semantic visualization baselines that do not consider word embeddings. 2017-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3766 info:doi/10.24963/ijcai.2017/288 https://ink.library.smu.edu.sg/context/sis_research/article/4768/viewcontent/ijcai17a.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Machine Learning Data Mining Feature Selection/Construction Learning Graphical Models Databases and Information Systems Data Storage Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Machine Learning Data Mining Feature Selection/Construction Learning Graphical Models Databases and Information Systems Data Storage Systems
spellingShingle	Machine Learning Data Mining Feature Selection/Construction Learning Graphical Models Databases and Information Systems Data Storage Systems LE, Van Minh Tuan LAUW, Hady W. Semantic visualization for short texts with word embeddings
description	Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a novel semantic visualization model that seamlessly integrates visualization coordinates, topic distributions, and word vectors. We propose a model called GaussianSV, which outperforms pipelined baselines that derive topic models and visualization coordinates as disjoint steps, as well as semantic visualization baselines that do not consider word embeddings.
format	text
author	LE, Van Minh Tuan LAUW, Hady W.
author_facet	LE, Van Minh Tuan LAUW, Hady W.
author_sort	LE, Van Minh Tuan
title	Semantic visualization for short texts with word embeddings
title_short	Semantic visualization for short texts with word embeddings
title_full	Semantic visualization for short texts with word embeddings
title_fullStr	Semantic visualization for short texts with word embeddings
title_full_unstemmed	Semantic visualization for short texts with word embeddings
title_sort	semantic visualization for short texts with word embeddings
publisher	Institutional Knowledge at Singapore Management University
publishDate	2017
url	https://ink.library.smu.edu.sg/sis_research/3766 https://ink.library.smu.edu.sg/context/sis_research/article/4768/viewcontent/ijcai17a.pdf
_version_	1770573727538348032

Semantic visualization for short texts with word embeddings

Similar Items