Towards effective neural topic modeling
Over the past few decades, the world has witnessed an unprecedented explosion of information. Of these, a substantial portion consists of unlabeled textual data, such as tweets, news articles, product reviews, and web snippets. As labeling is extremely expensive, time-consuming, and sometimes biase...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181934 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-181934 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Neural networks Deep learning Topic model Text mining Variational autoEncoder |
spellingShingle |
Computer and Information Science Neural networks Deep learning Topic model Text mining Variational autoEncoder Wu, Xiaobao Towards effective neural topic modeling |
description |
Over the past few decades, the world has witnessed an unprecedented explosion of information. Of these, a substantial portion consists of unlabeled textual data, such as tweets, news articles, product reviews, and web snippets.
As labeling is extremely expensive, time-consuming, and sometimes biased,
how to effectively analyze these data becomes an imperative.
Owing to this, Neural Topic Models (NTMs) have emerged as a promising solution that attracts considerable research attention for their capability and interpretability.
They automatically discover latent topics from unlabeled textual data through neural networks, enabling unsupervised document understanding.
They have derived various downstream applications, such as content recommendation, trend analysis, and text summarization.
Compared to conventional topic models like LDA, NTMs offer structural flexibility and also support gradient back-propagation,
which avoids complicated model-specific derivations and well handle large-scale data.
However, despite their promise, existing NTMs generally encounter several critical challenges. On the one hand, NTMs often produce low-quality topics, which are even incomparable to conventional models.
These topics tend to be incoherent or repetitive, significantly diminishing their informativeness.
On the other hand, NTMs struggle with low inference ability, leading to less accurate topic distributions for documents.
This limitation greatly hinders document understanding and undermines the subsequent analysis, reasoning, or decision making processes.
Due to these challenges, existing NTMs are less useful and applicable for downstream tasks or applications.
As a result, it is necessary to enhance NTMs to deliver more reliable and informative topic modeling.
This thesis aims to advance neural topic modeling by addressing these key challenges.
In particular, we focus on four most popular scenarios: short-texts, cross-lingual, hierarchical, and basic neural topic modeling.
First, we propose a novel neural topic model tailored for short texts.
This model leverages a topic-semantic contrastive learning method to captures the similarity relations among short text samples, which works regardless of data augmentation availability.
This refines short text representations, enriches learning signals, thus effectively alleviating the data sparsity issue of short texts and producing informative topics.
Second, we explore dynamic topic modeling.
We present a neural dynamic topic model to track the evolution of dynamic topics by building the contrastive relations among them, rather than relying on the conventional Markov Chains. This model further explicitly excludes unassociated words from dynamic topics to enhance the alignment to their respective time slices.
These improvements enable our model to reliably track topic evolution with high diversity.
Third, we focus on the the affinity, rationality, and diversity of hierarchical topic modeling.
Our proposed neural hierarchical topic model ensures the sparsity and balance of cross-level topic dependencies using a transport plan dependency method.
Moreover, it distributes different semantic granularity to topics at different levels by disentangled decoding.
With these, our model produces affinitive, rational, and diverse topic hierarchies.
Fourth, we tackle the topic collapsing issue and propose a basic neural topic model.
Apart from the common reconstruction error, this model introduces a new embedding clustering regularization to force each topic embedding to be the center of a separately aggregated word embedding cluster in the semantic space.
This produces topics with distinct semantics and effectively resolves the topic collapsing issue.
Finally, we develop a comprehensive topic modeling toolkit that includes both previous conventional and cutting-edge neural topic models.
This toolkit covers the complete pipelines of topic modeling, such as dataset preprocessing, model training, and evaluation.
These improvements position our toolkit as a valuable resource to accelerate the research and applications of topic models.
In conclusion, this thesis makes significant contributions towards advancing neural topic modeling through multiple models, various scenarios, and a rigorous and comprehensive benchmark toolkit.
These contributions pave the way for the utilization of neural topic modeling in diverse real-world applications. |
author2 |
Luu Anh Tuan |
author_facet |
Luu Anh Tuan Wu, Xiaobao |
format |
Thesis-Doctor of Philosophy |
author |
Wu, Xiaobao |
author_sort |
Wu, Xiaobao |
title |
Towards effective neural topic modeling |
title_short |
Towards effective neural topic modeling |
title_full |
Towards effective neural topic modeling |
title_fullStr |
Towards effective neural topic modeling |
title_full_unstemmed |
Towards effective neural topic modeling |
title_sort |
towards effective neural topic modeling |
publisher |
Nanyang Technological University |
publishDate |
2025 |
url |
https://hdl.handle.net/10356/181934 |
_version_ |
1821237105890689024 |
spelling |
sg-ntu-dr.10356-1819342025-01-03T02:22:50Z Towards effective neural topic modeling Wu, Xiaobao Luu Anh Tuan College of Computing and Data Science anhtuan.luu@ntu.edu.sg Computer and Information Science Neural networks Deep learning Topic model Text mining Variational autoEncoder Over the past few decades, the world has witnessed an unprecedented explosion of information. Of these, a substantial portion consists of unlabeled textual data, such as tweets, news articles, product reviews, and web snippets. As labeling is extremely expensive, time-consuming, and sometimes biased, how to effectively analyze these data becomes an imperative. Owing to this, Neural Topic Models (NTMs) have emerged as a promising solution that attracts considerable research attention for their capability and interpretability. They automatically discover latent topics from unlabeled textual data through neural networks, enabling unsupervised document understanding. They have derived various downstream applications, such as content recommendation, trend analysis, and text summarization. Compared to conventional topic models like LDA, NTMs offer structural flexibility and also support gradient back-propagation, which avoids complicated model-specific derivations and well handle large-scale data. However, despite their promise, existing NTMs generally encounter several critical challenges. On the one hand, NTMs often produce low-quality topics, which are even incomparable to conventional models. These topics tend to be incoherent or repetitive, significantly diminishing their informativeness. On the other hand, NTMs struggle with low inference ability, leading to less accurate topic distributions for documents. This limitation greatly hinders document understanding and undermines the subsequent analysis, reasoning, or decision making processes. Due to these challenges, existing NTMs are less useful and applicable for downstream tasks or applications. As a result, it is necessary to enhance NTMs to deliver more reliable and informative topic modeling. This thesis aims to advance neural topic modeling by addressing these key challenges. In particular, we focus on four most popular scenarios: short-texts, cross-lingual, hierarchical, and basic neural topic modeling. First, we propose a novel neural topic model tailored for short texts. This model leverages a topic-semantic contrastive learning method to captures the similarity relations among short text samples, which works regardless of data augmentation availability. This refines short text representations, enriches learning signals, thus effectively alleviating the data sparsity issue of short texts and producing informative topics. Second, we explore dynamic topic modeling. We present a neural dynamic topic model to track the evolution of dynamic topics by building the contrastive relations among them, rather than relying on the conventional Markov Chains. This model further explicitly excludes unassociated words from dynamic topics to enhance the alignment to their respective time slices. These improvements enable our model to reliably track topic evolution with high diversity. Third, we focus on the the affinity, rationality, and diversity of hierarchical topic modeling. Our proposed neural hierarchical topic model ensures the sparsity and balance of cross-level topic dependencies using a transport plan dependency method. Moreover, it distributes different semantic granularity to topics at different levels by disentangled decoding. With these, our model produces affinitive, rational, and diverse topic hierarchies. Fourth, we tackle the topic collapsing issue and propose a basic neural topic model. Apart from the common reconstruction error, this model introduces a new embedding clustering regularization to force each topic embedding to be the center of a separately aggregated word embedding cluster in the semantic space. This produces topics with distinct semantics and effectively resolves the topic collapsing issue. Finally, we develop a comprehensive topic modeling toolkit that includes both previous conventional and cutting-edge neural topic models. This toolkit covers the complete pipelines of topic modeling, such as dataset preprocessing, model training, and evaluation. These improvements position our toolkit as a valuable resource to accelerate the research and applications of topic models. In conclusion, this thesis makes significant contributions towards advancing neural topic modeling through multiple models, various scenarios, and a rigorous and comprehensive benchmark toolkit. These contributions pave the way for the utilization of neural topic modeling in diverse real-world applications. Doctor of Philosophy 2025-01-03T02:22:50Z 2025-01-03T02:22:50Z 2024 Thesis-Doctor of Philosophy Wu, X. (2024). Towards effective neural topic modeling. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181934 https://hdl.handle.net/10356/181934 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |