Towards effective neural topic modeling

Over the past few decades, the world has witnessed an unprecedented explosion of information. Of these, a substantial portion consists of unlabeled textual data, such as tweets, news articles, product reviews, and web snippets. As labeling is extremely expensive, time-consuming, and sometimes biase...

Full description

Saved in:

Bibliographic Details
Main Author:	Wu, Xiaobao
Other Authors:	Luu Anh Tuan
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2025
Subjects:	Computer and Information Science Neural networks Deep learning Topic model Text mining Variational autoEncoder
Online Access:	https://hdl.handle.net/10356/181934
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-181934
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Neural networks Deep learning Topic model Text mining Variational autoEncoder
spellingShingle	Computer and Information Science Neural networks Deep learning Topic model Text mining Variational autoEncoder Wu, Xiaobao Towards effective neural topic modeling
description	Over the past few decades, the world has witnessed an unprecedented explosion of information. Of these, a substantial portion consists of unlabeled textual data, such as tweets, news articles, product reviews, and web snippets. As labeling is extremely expensive, time-consuming, and sometimes biased, how to effectively analyze these data becomes an imperative. Owing to this, Neural Topic Models (NTMs) have emerged as a promising solution that attracts considerable research attention for their capability and interpretability. They automatically discover latent topics from unlabeled textual data through neural networks, enabling unsupervised document understanding. They have derived various downstream applications, such as content recommendation, trend analysis, and text summarization. Compared to conventional topic models like LDA, NTMs offer structural flexibility and also support gradient back-propagation, which avoids complicated model-specific derivations and well handle large-scale data. However, despite their promise, existing NTMs generally encounter several critical challenges. On the one hand, NTMs often produce low-quality topics, which are even incomparable to conventional models. These topics tend to be incoherent or repetitive, significantly diminishing their informativeness. On the other hand, NTMs struggle with low inference ability, leading to less accurate topic distributions for documents. This limitation greatly hinders document understanding and undermines the subsequent analysis, reasoning, or decision making processes. Due to these challenges, existing NTMs are less useful and applicable for downstream tasks or applications. As a result, it is necessary to enhance NTMs to deliver more reliable and informative topic modeling. This thesis aims to advance neural topic modeling by addressing these key challenges. In particular, we focus on four most popular scenarios: short-texts, cross-lingual, hierarchical, and basic neural topic modeling. First, we propose a novel neural topic model tailored for short texts. This model leverages a topic-semantic contrastive learning method to captures the similarity relations among short text samples, which works regardless of data augmentation availability. This refines short text representations, enriches learning signals, thus effectively alleviating the data sparsity issue of short texts and producing informative topics. Second, we explore dynamic topic modeling. We present a neural dynamic topic model to track the evolution of dynamic topics by building the contrastive relations among them, rather than relying on the conventional Markov Chains. This model further explicitly excludes unassociated words from dynamic topics to enhance the alignment to their respective time slices. These improvements enable our model to reliably track topic evolution with high diversity. Third, we focus on the the affinity, rationality, and diversity of hierarchical topic modeling. Our proposed neural hierarchical topic model ensures the sparsity and balance of cross-level topic dependencies using a transport plan dependency method. Moreover, it distributes different semantic granularity to topics at different levels by disentangled decoding. With these, our model produces affinitive, rational, and diverse topic hierarchies. Fourth, we tackle the topic collapsing issue and propose a basic neural topic model. Apart from the common reconstruction error, this model introduces a new embedding clustering regularization to force each topic embedding to be the center of a separately aggregated word embedding cluster in the semantic space. This produces topics with distinct semantics and effectively resolves the topic collapsing issue. Finally, we develop a comprehensive topic modeling toolkit that includes both previous conventional and cutting-edge neural topic models. This toolkit covers the complete pipelines of topic modeling, such as dataset preprocessing, model training, and evaluation. These improvements position our toolkit as a valuable resource to accelerate the research and applications of topic models. In conclusion, this thesis makes significant contributions towards advancing neural topic modeling through multiple models, various scenarios, and a rigorous and comprehensive benchmark toolkit. These contributions pave the way for the utilization of neural topic modeling in diverse real-world applications.
author2	Luu Anh Tuan
author_facet	Luu Anh Tuan Wu, Xiaobao
format	Thesis-Doctor of Philosophy
author	Wu, Xiaobao
author_sort	Wu, Xiaobao
title	Towards effective neural topic modeling
title_short	Towards effective neural topic modeling
title_full	Towards effective neural topic modeling
title_fullStr	Towards effective neural topic modeling
title_full_unstemmed	Towards effective neural topic modeling
title_sort	towards effective neural topic modeling
publisher	Nanyang Technological University
publishDate	2025
url	https://hdl.handle.net/10356/181934
_version_	1821237105890689024
spelling	sg-ntu-dr.10356-1819342025-01-03T02:22:50Z Towards effective neural topic modeling Wu, Xiaobao Luu Anh Tuan College of Computing and Data Science anhtuan.luu@ntu.edu.sg Computer and Information Science Neural networks Deep learning Topic model Text mining Variational autoEncoder Over the past few decades, the world has witnessed an unprecedented explosion of information. Of these, a substantial portion consists of unlabeled textual data, such as tweets, news articles, product reviews, and web snippets. As labeling is extremely expensive, time-consuming, and sometimes biased, how to effectively analyze these data becomes an imperative. Owing to this, Neural Topic Models (NTMs) have emerged as a promising solution that attracts considerable research attention for their capability and interpretability. They automatically discover latent topics from unlabeled textual data through neural networks, enabling unsupervised document understanding. They have derived various downstream applications, such as content recommendation, trend analysis, and text summarization. Compared to conventional topic models like LDA, NTMs offer structural flexibility and also support gradient back-propagation, which avoids complicated model-specific derivations and well handle large-scale data. However, despite their promise, existing NTMs generally encounter several critical challenges. On the one hand, NTMs often produce low-quality topics, which are even incomparable to conventional models. These topics tend to be incoherent or repetitive, significantly diminishing their informativeness. On the other hand, NTMs struggle with low inference ability, leading to less accurate topic distributions for documents. This limitation greatly hinders document understanding and undermines the subsequent analysis, reasoning, or decision making processes. Due to these challenges, existing NTMs are less useful and applicable for downstream tasks or applications. As a result, it is necessary to enhance NTMs to deliver more reliable and informative topic modeling. This thesis aims to advance neural topic modeling by addressing these key challenges. In particular, we focus on four most popular scenarios: short-texts, cross-lingual, hierarchical, and basic neural topic modeling. First, we propose a novel neural topic model tailored for short texts. This model leverages a topic-semantic contrastive learning method to captures the similarity relations among short text samples, which works regardless of data augmentation availability. This refines short text representations, enriches learning signals, thus effectively alleviating the data sparsity issue of short texts and producing informative topics. Second, we explore dynamic topic modeling. We present a neural dynamic topic model to track the evolution of dynamic topics by building the contrastive relations among them, rather than relying on the conventional Markov Chains. This model further explicitly excludes unassociated words from dynamic topics to enhance the alignment to their respective time slices. These improvements enable our model to reliably track topic evolution with high diversity. Third, we focus on the the affinity, rationality, and diversity of hierarchical topic modeling. Our proposed neural hierarchical topic model ensures the sparsity and balance of cross-level topic dependencies using a transport plan dependency method. Moreover, it distributes different semantic granularity to topics at different levels by disentangled decoding. With these, our model produces affinitive, rational, and diverse topic hierarchies. Fourth, we tackle the topic collapsing issue and propose a basic neural topic model. Apart from the common reconstruction error, this model introduces a new embedding clustering regularization to force each topic embedding to be the center of a separately aggregated word embedding cluster in the semantic space. This produces topics with distinct semantics and effectively resolves the topic collapsing issue. Finally, we develop a comprehensive topic modeling toolkit that includes both previous conventional and cutting-edge neural topic models. This toolkit covers the complete pipelines of topic modeling, such as dataset preprocessing, model training, and evaluation. These improvements position our toolkit as a valuable resource to accelerate the research and applications of topic models. In conclusion, this thesis makes significant contributions towards advancing neural topic modeling through multiple models, various scenarios, and a rigorous and comprehensive benchmark toolkit. These contributions pave the way for the utilization of neural topic modeling in diverse real-world applications. Doctor of Philosophy 2025-01-03T02:22:50Z 2025-01-03T02:22:50Z 2024 Thesis-Doctor of Philosophy Wu, X. (2024). Towards effective neural topic modeling. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181934 https://hdl.handle.net/10356/181934 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Towards effective neural topic modeling

Similar Items