Bursty feature representation for clustering text streams
Text representation plays a crucial role in classical text mining, where the primary focus was on static text. Nevertheless, well-studied static text representations including TFIDF are not optimized for non-stationary streams of information such as news, discussion board messages, and blogs. We the...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2007
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/1273 http://doi.org/10.1137/1.9781611972771.50 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-2272 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-22722018-06-22T03:12:18Z Bursty feature representation for clustering text streams HE, Qi CHANG, Kuiyu LIM, Ee Peng ZHANG, Jun Text representation plays a crucial role in classical text mining, where the primary focus was on static text. Nevertheless, well-studied static text representations including TFIDF are not optimized for non-stationary streams of information such as news, discussion board messages, and blogs. We therefore introduce a new temporal representation for text streams based on bursty features. Our bursty text representation differs significantly from traditional schemes in that it 1) dynamically represents documents over time, 2) amplifies a feature in proportional to its burstiness at any point in time, and 3) is topic independent. Our bursty text representation model was evaluated against a classical bagof-words text representation on the task of clustering TDT3 topical text streams. It was shown to consistently yield more cohesive clusters in terms of cluster purity and cluster/class entropies. This new temporal bursty text representation can be extended to most text mining tasks involving a temporal dimension, such as modeling of online blog pages. 2007-04-01T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/1273 info:doi/10.1137/1.9781611972771.50 http://doi.org/10.1137/1.9781611972771.50 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems Numerical Analysis and Scientific Computing |
spellingShingle |
Databases and Information Systems Numerical Analysis and Scientific Computing HE, Qi CHANG, Kuiyu LIM, Ee Peng ZHANG, Jun Bursty feature representation for clustering text streams |
description |
Text representation plays a crucial role in classical text mining, where the primary focus was on static text. Nevertheless, well-studied static text representations including TFIDF are not optimized for non-stationary streams of information such as news, discussion board messages, and blogs. We therefore introduce a new temporal representation for text streams based on bursty features. Our bursty text representation differs significantly from traditional schemes in that it 1) dynamically represents documents over time, 2) amplifies a feature in proportional to its burstiness at any point in time, and 3) is topic independent. Our bursty text representation model was evaluated against a classical bagof-words text representation on the task of clustering TDT3 topical text streams. It was shown to consistently yield more cohesive clusters in terms of cluster purity and cluster/class entropies. This new temporal bursty text representation can be extended to most text mining tasks involving a temporal dimension, such as modeling of online blog pages. |
format |
text |
author |
HE, Qi CHANG, Kuiyu LIM, Ee Peng ZHANG, Jun |
author_facet |
HE, Qi CHANG, Kuiyu LIM, Ee Peng ZHANG, Jun |
author_sort |
HE, Qi |
title |
Bursty feature representation for clustering text streams |
title_short |
Bursty feature representation for clustering text streams |
title_full |
Bursty feature representation for clustering text streams |
title_fullStr |
Bursty feature representation for clustering text streams |
title_full_unstemmed |
Bursty feature representation for clustering text streams |
title_sort |
bursty feature representation for clustering text streams |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2007 |
url |
https://ink.library.smu.edu.sg/sis_research/1273 http://doi.org/10.1137/1.9781611972771.50 |
_version_ |
1770570913361690624 |