Meta-complementing the semantics of short texts in neural topic models

Topic models infer latent topic distributions based on observed word co-occurrences in a text corpus. While typically a corpus contains documents of variable lengths, most previous topic models treat documents of different lengths uniformly, assuming that each document is sufficiently informative. H...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG, Ce, LAUW, Hady Wirawan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7609
https://ink.library.smu.edu.sg/context/sis_research/article/8612/viewcontent/neurips22.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8612
record_format dspace
spelling sg-smu-ink.sis_research-86122022-12-22T03:29:29Z Meta-complementing the semantics of short texts in neural topic models ZHANG, Ce LAUW, Hady Wirawan Topic models infer latent topic distributions based on observed word co-occurrences in a text corpus. While typically a corpus contains documents of variable lengths, most previous topic models treat documents of different lengths uniformly, assuming that each document is sufficiently informative. However, shorter documents may have only a few word co-occurrences, resulting in inferior topic quality. Some other previous works assume that all documents are short, and leverage external auxiliary data, e.g., pretrained word embeddings and document connectivity. Orthogonal to existing works, we remedy this problem within the corpus itself by proposing a Meta-Complement Topic Model, which improves topic quality of short texts by transferring the semantic knowledge learned on long documents to complement semantically limited short texts. As a self-contained module, our framework is agnostic to auxiliary data and can be further improved by flexibly integrating them into our framework. Specifically, when incorporating document connectivity, we further extend our framework to complement documents with limited edges. Experiments demonstrate the advantage of our framework. 2022-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7609 https://ink.library.smu.edu.sg/context/sis_research/article/8612/viewcontent/neurips22.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Topic models short documents document connectivity improved topic quality Databases and Information Systems Numerical Analysis and Scientific Computing Theory and Algorithms
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Topic models
short documents
document connectivity
improved topic quality
Databases and Information Systems
Numerical Analysis and Scientific Computing
Theory and Algorithms
spellingShingle Topic models
short documents
document connectivity
improved topic quality
Databases and Information Systems
Numerical Analysis and Scientific Computing
Theory and Algorithms
ZHANG, Ce
LAUW, Hady Wirawan
Meta-complementing the semantics of short texts in neural topic models
description Topic models infer latent topic distributions based on observed word co-occurrences in a text corpus. While typically a corpus contains documents of variable lengths, most previous topic models treat documents of different lengths uniformly, assuming that each document is sufficiently informative. However, shorter documents may have only a few word co-occurrences, resulting in inferior topic quality. Some other previous works assume that all documents are short, and leverage external auxiliary data, e.g., pretrained word embeddings and document connectivity. Orthogonal to existing works, we remedy this problem within the corpus itself by proposing a Meta-Complement Topic Model, which improves topic quality of short texts by transferring the semantic knowledge learned on long documents to complement semantically limited short texts. As a self-contained module, our framework is agnostic to auxiliary data and can be further improved by flexibly integrating them into our framework. Specifically, when incorporating document connectivity, we further extend our framework to complement documents with limited edges. Experiments demonstrate the advantage of our framework.
format text
author ZHANG, Ce
LAUW, Hady Wirawan
author_facet ZHANG, Ce
LAUW, Hady Wirawan
author_sort ZHANG, Ce
title Meta-complementing the semantics of short texts in neural topic models
title_short Meta-complementing the semantics of short texts in neural topic models
title_full Meta-complementing the semantics of short texts in neural topic models
title_fullStr Meta-complementing the semantics of short texts in neural topic models
title_full_unstemmed Meta-complementing the semantics of short texts in neural topic models
title_sort meta-complementing the semantics of short texts in neural topic models
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7609
https://ink.library.smu.edu.sg/context/sis_research/article/8612/viewcontent/neurips22.pdf
_version_ 1770576393725280256