A spectroscopy of texts for effective clustering

For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of different data characteristics and analysis contexts, it is often difficu...

Full description

Saved in:
Bibliographic Details
Main Authors: LI, Wenyuan, NG, Wee-Keong, ONG, Kok-Leong, LIM, Ee Peng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2004
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/1018
https://ink.library.smu.edu.sg/context/sis_research/article/2017/viewcontent/Li2004_Chapter_ASpectroscopyOfTextsForEffecti.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2017
record_format dspace
spelling sg-smu-ink.sis_research-20172018-06-22T02:50:37Z A spectroscopy of texts for effective clustering LI, Wenyuan NG, Wee-Keong ONG, Kok-Leong LIM, Ee Peng For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images or biological data. The fundamental question this paper addresses is: ldquoHow can we effectively estimate the natural number of clusters in a given text collection?rdquo. We propose to use spectral analysis, which analyzes the eigenvalues (not eigenvectors) of the collection, as the solution to the above. We first present the relationship between a text collection and its underlying spectra. We then show how the answer to this question enhances the clustering process. Finally, we conclude with empirical results and related work. 2004-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1018 info:doi/10.1007/978-3-540-30116-5_29 https://ink.library.smu.edu.sg/context/sis_research/article/2017/viewcontent/Li2004_Chapter_ASpectroscopyOfTextsForEffecti.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Databases and Information Systems
Numerical Analysis and Scientific Computing
LI, Wenyuan
NG, Wee-Keong
ONG, Kok-Leong
LIM, Ee Peng
A spectroscopy of texts for effective clustering
description For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images or biological data. The fundamental question this paper addresses is: ldquoHow can we effectively estimate the natural number of clusters in a given text collection?rdquo. We propose to use spectral analysis, which analyzes the eigenvalues (not eigenvectors) of the collection, as the solution to the above. We first present the relationship between a text collection and its underlying spectra. We then show how the answer to this question enhances the clustering process. Finally, we conclude with empirical results and related work.
format text
author LI, Wenyuan
NG, Wee-Keong
ONG, Kok-Leong
LIM, Ee Peng
author_facet LI, Wenyuan
NG, Wee-Keong
ONG, Kok-Leong
LIM, Ee Peng
author_sort LI, Wenyuan
title A spectroscopy of texts for effective clustering
title_short A spectroscopy of texts for effective clustering
title_full A spectroscopy of texts for effective clustering
title_fullStr A spectroscopy of texts for effective clustering
title_full_unstemmed A spectroscopy of texts for effective clustering
title_sort spectroscopy of texts for effective clustering
publisher Institutional Knowledge at Singapore Management University
publishDate 2004
url https://ink.library.smu.edu.sg/sis_research/1018
https://ink.library.smu.edu.sg/context/sis_research/article/2017/viewcontent/Li2004_Chapter_ASpectroscopyOfTextsForEffecti.pdf
_version_ 1770570824226439168