Keep it simple with time: A reexamination of probabilistic topic detection models

Topic detection (TD) is a fundamental research issue in the Topic Detection and Tracking (TDT) community with practical implications; TD helps analysts to separate the wheat from the chaff among the thousands of incoming news streams. In this paper, we propose a simple and effective topic detection...

Full description

Saved in:
Bibliographic Details
Main Authors: HE, Qi, CHANG, Kuiyu, LIM, Ee Peng, Banerjee, Arindam
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2010
Subjects:
DPM
Online Access:https://ink.library.smu.edu.sg/sis_research/1322
https://ink.library.smu.edu.sg/context/sis_research/article/2321/viewcontent/05374412.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2321
record_format dspace
spelling sg-smu-ink.sis_research-23212018-06-13T07:00:10Z Keep it simple with time: A reexamination of probabilistic topic detection models HE, Qi CHANG, Kuiyu LIM, Ee Peng Banerjee, Arindam Topic detection (TD) is a fundamental research issue in the Topic Detection and Tracking (TDT) community with practical implications; TD helps analysts to separate the wheat from the chaff among the thousands of incoming news streams. In this paper, we propose a simple and effective topic detection model called the temporal Discriminative Probabilistic Model (DPM), which is shown to be theoretically equivalent to the classic vector space model with feature selection and temporally discriminative weights. We compare DPM to its various probabilistic cousins, ranging from mixture models like von-Mises Fisher (vMF) to mixed membership models like Latent Dirichlet Allocation (LDA). Benchmark results on the TDT3 data set show that sophisticated models, such as vMF and LDA, do not necessarily lead to better results; in the case of LDA, notably worst performance was obtained under variational inference, which is likely due to the significantly large number of LDA model parameters involved for document-level topic detection. On the contrary, using a relatively simple time-aware probabilistic model such as DPM suffices for both offline and online topic detection tasks, making DPM a theoretically elegant and effective model for practical topic detection. 2010-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1322 info:doi/10.1109/TPAMI.2009.203 https://ink.library.smu.edu.sg/context/sis_research/article/2321/viewcontent/05374412.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University DPM TFIDF Topic detection bursty feature online probabilistic model time-aware Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic DPM
TFIDF
Topic detection
bursty feature
online
probabilistic model
time-aware
Databases and Information Systems
spellingShingle DPM
TFIDF
Topic detection
bursty feature
online
probabilistic model
time-aware
Databases and Information Systems
HE, Qi
CHANG, Kuiyu
LIM, Ee Peng
Banerjee, Arindam
Keep it simple with time: A reexamination of probabilistic topic detection models
description Topic detection (TD) is a fundamental research issue in the Topic Detection and Tracking (TDT) community with practical implications; TD helps analysts to separate the wheat from the chaff among the thousands of incoming news streams. In this paper, we propose a simple and effective topic detection model called the temporal Discriminative Probabilistic Model (DPM), which is shown to be theoretically equivalent to the classic vector space model with feature selection and temporally discriminative weights. We compare DPM to its various probabilistic cousins, ranging from mixture models like von-Mises Fisher (vMF) to mixed membership models like Latent Dirichlet Allocation (LDA). Benchmark results on the TDT3 data set show that sophisticated models, such as vMF and LDA, do not necessarily lead to better results; in the case of LDA, notably worst performance was obtained under variational inference, which is likely due to the significantly large number of LDA model parameters involved for document-level topic detection. On the contrary, using a relatively simple time-aware probabilistic model such as DPM suffices for both offline and online topic detection tasks, making DPM a theoretically elegant and effective model for practical topic detection.
format text
author HE, Qi
CHANG, Kuiyu
LIM, Ee Peng
Banerjee, Arindam
author_facet HE, Qi
CHANG, Kuiyu
LIM, Ee Peng
Banerjee, Arindam
author_sort HE, Qi
title Keep it simple with time: A reexamination of probabilistic topic detection models
title_short Keep it simple with time: A reexamination of probabilistic topic detection models
title_full Keep it simple with time: A reexamination of probabilistic topic detection models
title_fullStr Keep it simple with time: A reexamination of probabilistic topic detection models
title_full_unstemmed Keep it simple with time: A reexamination of probabilistic topic detection models
title_sort keep it simple with time: a reexamination of probabilistic topic detection models
publisher Institutional Knowledge at Singapore Management University
publishDate 2010
url https://ink.library.smu.edu.sg/sis_research/1322
https://ink.library.smu.edu.sg/context/sis_research/article/2321/viewcontent/05374412.pdf
_version_ 1770570965784199168