Modeling Syntactic Structures of Topics with a Nested HMM-LDA

Latent Dirichlet allocation (LDA) is a commonly used topic modeling method for text analysis and mining. Standard LDA treats documents as bags of words, ignoring the syntactic structures of sentences. In this paper, we propose a hybrid model that embeds hidden Markov models (HMMs) within LDA topics...

Full description

Saved in:
Bibliographic Details
Main Author: JIANG, Jing
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2009
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/351
http://dx.doi.org/10.1109/ICDM.2009.144
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-1350
record_format dspace
spelling sg-smu-ink.sis_research-13502018-07-04T08:27:55Z Modeling Syntactic Structures of Topics with a Nested HMM-LDA JIANG, Jing Latent Dirichlet allocation (LDA) is a commonly used topic modeling method for text analysis and mining. Standard LDA treats documents as bags of words, ignoring the syntactic structures of sentences. In this paper, we propose a hybrid model that embeds hidden Markov models (HMMs) within LDA topics to jointly model both the topics and the syntactic structures within each topic. Our model is general and subsumes standard LDA and HMM as special cases. Compared with standard LDA and HMM, our model can simultaneously discover both topic-specific content words and background functional words shared among topics. Our model can also automatically separate content words that play different roles within a topic. Using perplexity as evaluation metric, our model returns lower perplexity for unseen test documents compared with standard LDA, which shows its better generalization power than LDA. 2009-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/351 info:doi/10.1109/ICDM.2009.144 http://dx.doi.org/10.1109/ICDM.2009.144 http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University background functional words hidden Markov models latent Dirichlet allocation syntactic structure modeling text analysis text mining topic modeling method topic-specific content words Computer Sciences Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic background functional words
hidden Markov models
latent Dirichlet allocation
syntactic structure modeling
text analysis
text mining
topic modeling method
topic-specific content words
Computer Sciences
Numerical Analysis and Scientific Computing
spellingShingle background functional words
hidden Markov models
latent Dirichlet allocation
syntactic structure modeling
text analysis
text mining
topic modeling method
topic-specific content words
Computer Sciences
Numerical Analysis and Scientific Computing
JIANG, Jing
Modeling Syntactic Structures of Topics with a Nested HMM-LDA
description Latent Dirichlet allocation (LDA) is a commonly used topic modeling method for text analysis and mining. Standard LDA treats documents as bags of words, ignoring the syntactic structures of sentences. In this paper, we propose a hybrid model that embeds hidden Markov models (HMMs) within LDA topics to jointly model both the topics and the syntactic structures within each topic. Our model is general and subsumes standard LDA and HMM as special cases. Compared with standard LDA and HMM, our model can simultaneously discover both topic-specific content words and background functional words shared among topics. Our model can also automatically separate content words that play different roles within a topic. Using perplexity as evaluation metric, our model returns lower perplexity for unseen test documents compared with standard LDA, which shows its better generalization power than LDA.
format text
author JIANG, Jing
author_facet JIANG, Jing
author_sort JIANG, Jing
title Modeling Syntactic Structures of Topics with a Nested HMM-LDA
title_short Modeling Syntactic Structures of Topics with a Nested HMM-LDA
title_full Modeling Syntactic Structures of Topics with a Nested HMM-LDA
title_fullStr Modeling Syntactic Structures of Topics with a Nested HMM-LDA
title_full_unstemmed Modeling Syntactic Structures of Topics with a Nested HMM-LDA
title_sort modeling syntactic structures of topics with a nested hmm-lda
publisher Institutional Knowledge at Singapore Management University
publishDate 2009
url https://ink.library.smu.edu.sg/sis_research/351
http://dx.doi.org/10.1109/ICDM.2009.144
_version_ 1770570394709786624