Topic Modeling with Document Relative Similarities

Topic modeling has been widely used in text mining. Previous topic models such as Latent Dirichlet Allocation (LDA) are successful in learning hidden topics but they do not take into account metadata of documents. To tackle this problem, many augmented topic models have been proposed to jointly mode...

Full description

Saved in:
Bibliographic Details
Main Authors: DU, Jianguang, Jing JIANG, SONG, Dandan, LIAO, Lejian
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2015
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3070
https://ink.library.smu.edu.sg/context/sis_research/article/4070/viewcontent/P_ID_52343_IJCAI15_488_TopicModelingDocRelSimilarities.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4070
record_format dspace
spelling sg-smu-ink.sis_research-40702016-02-05T06:30:05Z Topic Modeling with Document Relative Similarities DU, Jianguang Jing JIANG, SONG, Dandan LIAO, Lejian Topic modeling has been widely used in text mining. Previous topic models such as Latent Dirichlet Allocation (LDA) are successful in learning hidden topics but they do not take into account metadata of documents. To tackle this problem, many augmented topic models have been proposed to jointly model text and metadata. But most existing models handle only categorical and numerical types of metadata. We identify another type of metadata that can be more natural to obtain in some scenarios. These are relative similarities among documents. In this paper, we propose a general model that links LDA with constraints derived from document relative similarities. Specifically, in our model, the constraints act as a regularizer of the log likelihood of LDA. We fit the proposed model using Gibbs-EM. Experiments with two real world datasets show that our model is able to learn meaningful topics. The results also show that our model outperforms the baselines in terms of topic coherence and a document classification task. 2015-07-31T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3070 https://ink.library.smu.edu.sg/context/sis_research/article/4070/viewcontent/P_ID_52343_IJCAI15_488_TopicModelingDocRelSimilarities.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Computer Sciences Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Computer Sciences
Databases and Information Systems
spellingShingle Computer Sciences
Databases and Information Systems
DU, Jianguang
Jing JIANG,
SONG, Dandan
LIAO, Lejian
Topic Modeling with Document Relative Similarities
description Topic modeling has been widely used in text mining. Previous topic models such as Latent Dirichlet Allocation (LDA) are successful in learning hidden topics but they do not take into account metadata of documents. To tackle this problem, many augmented topic models have been proposed to jointly model text and metadata. But most existing models handle only categorical and numerical types of metadata. We identify another type of metadata that can be more natural to obtain in some scenarios. These are relative similarities among documents. In this paper, we propose a general model that links LDA with constraints derived from document relative similarities. Specifically, in our model, the constraints act as a regularizer of the log likelihood of LDA. We fit the proposed model using Gibbs-EM. Experiments with two real world datasets show that our model is able to learn meaningful topics. The results also show that our model outperforms the baselines in terms of topic coherence and a document classification task.
format text
author DU, Jianguang
Jing JIANG,
SONG, Dandan
LIAO, Lejian
author_facet DU, Jianguang
Jing JIANG,
SONG, Dandan
LIAO, Lejian
author_sort DU, Jianguang
title Topic Modeling with Document Relative Similarities
title_short Topic Modeling with Document Relative Similarities
title_full Topic Modeling with Document Relative Similarities
title_fullStr Topic Modeling with Document Relative Similarities
title_full_unstemmed Topic Modeling with Document Relative Similarities
title_sort topic modeling with document relative similarities
publisher Institutional Knowledge at Singapore Management University
publishDate 2015
url https://ink.library.smu.edu.sg/sis_research/3070
https://ink.library.smu.edu.sg/context/sis_research/article/4070/viewcontent/P_ID_52343_IJCAI15_488_TopicModelingDocRelSimilarities.pdf
_version_ 1770572801074266112