Bursty feature representation for clustering text streams

Text representation plays a crucial role in classical text mining, where the primary focus was on static text. Nevertheless, well-studied static text representations including TFIDF are not optimized for non-stationary streams of information such as news, discussion board messages, and blogs. We the...

Full description

Saved in:
Bibliographic Details
Main Authors: HE, Qi, CHANG, Kuiyu, LIM, Ee Peng, ZHANG, Jun
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2007
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/1273
http://doi.org/10.1137/1.9781611972771.50
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2272
record_format dspace
spelling sg-smu-ink.sis_research-22722018-06-22T03:12:18Z Bursty feature representation for clustering text streams HE, Qi CHANG, Kuiyu LIM, Ee Peng ZHANG, Jun Text representation plays a crucial role in classical text mining, where the primary focus was on static text. Nevertheless, well-studied static text representations including TFIDF are not optimized for non-stationary streams of information such as news, discussion board messages, and blogs. We therefore introduce a new temporal representation for text streams based on bursty features. Our bursty text representation differs significantly from traditional schemes in that it 1) dynamically represents documents over time, 2) amplifies a feature in proportional to its burstiness at any point in time, and 3) is topic independent. Our bursty text representation model was evaluated against a classical bagof-words text representation on the task of clustering TDT3 topical text streams. It was shown to consistently yield more cohesive clusters in terms of cluster purity and cluster/class entropies. This new temporal bursty text representation can be extended to most text mining tasks involving a temporal dimension, such as modeling of online blog pages. 2007-04-01T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/1273 info:doi/10.1137/1.9781611972771.50 http://doi.org/10.1137/1.9781611972771.50 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Databases and Information Systems
Numerical Analysis and Scientific Computing
HE, Qi
CHANG, Kuiyu
LIM, Ee Peng
ZHANG, Jun
Bursty feature representation for clustering text streams
description Text representation plays a crucial role in classical text mining, where the primary focus was on static text. Nevertheless, well-studied static text representations including TFIDF are not optimized for non-stationary streams of information such as news, discussion board messages, and blogs. We therefore introduce a new temporal representation for text streams based on bursty features. Our bursty text representation differs significantly from traditional schemes in that it 1) dynamically represents documents over time, 2) amplifies a feature in proportional to its burstiness at any point in time, and 3) is topic independent. Our bursty text representation model was evaluated against a classical bagof-words text representation on the task of clustering TDT3 topical text streams. It was shown to consistently yield more cohesive clusters in terms of cluster purity and cluster/class entropies. This new temporal bursty text representation can be extended to most text mining tasks involving a temporal dimension, such as modeling of online blog pages.
format text
author HE, Qi
CHANG, Kuiyu
LIM, Ee Peng
ZHANG, Jun
author_facet HE, Qi
CHANG, Kuiyu
LIM, Ee Peng
ZHANG, Jun
author_sort HE, Qi
title Bursty feature representation for clustering text streams
title_short Bursty feature representation for clustering text streams
title_full Bursty feature representation for clustering text streams
title_fullStr Bursty feature representation for clustering text streams
title_full_unstemmed Bursty feature representation for clustering text streams
title_sort bursty feature representation for clustering text streams
publisher Institutional Knowledge at Singapore Management University
publishDate 2007
url https://ink.library.smu.edu.sg/sis_research/1273
http://doi.org/10.1137/1.9781611972771.50
_version_ 1770570913361690624