Visualization and analysis of document clusters produced by self-organizing maps
The problem of information overload with the huge number of text documents available makes them increasingly difficult to organize and analyze. To alleviate this problem, text document clustering is used to automatically group related documents together. However, documents usually produce very high-...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2013
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etd_masteral/4372 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
id |
oai:animorepository.dlsu.edu.ph:etd_masteral-11210 |
---|---|
record_format |
eprints |
spelling |
oai:animorepository.dlsu.edu.ph:etd_masteral-112102021-01-18T03:33:18Z Visualization and analysis of document clusters produced by self-organizing maps Landrito, Maynard R. The problem of information overload with the huge number of text documents available makes them increasingly difficult to organize and analyze. To alleviate this problem, text document clustering is used to automatically group related documents together. However, documents usually produce very high-dimensional data, making it resource-intensive to perform data processing on them. Random Projection Method (RPM) is shown to reduce the dimensionality of a large document dataset. The dimensionality reduction scheme is then coupled with Self-Organizing Maps (SOM) to organize the documents in the dataset. K-Means clustering is then performed on the SOM units to produce clusters of documents that were organized within the SOM. Various properties based on the SOM were introduced, as well as a method to measure and visualize them. These allowed for detailed analysis of the clusters and aided in nding outliers of the dataset, overlap between clusters, concentration of documents within clusters, possible subclusters and quality of di erent parts of clusters, among others. Cross-referencing between di erent property visualizations provided internal validation of the observations. For future work, the di erent SOM-based properties and their visualizations can be used for interactive document selection, recommendation systems, and quality measure. 2013-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/4372 Master's Theses English Animo Repository Document clustering Cluster analysis |
institution |
De La Salle University |
building |
De La Salle University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
De La Salle University Library |
collection |
DLSU Institutional Repository |
language |
English |
topic |
Document clustering Cluster analysis |
spellingShingle |
Document clustering Cluster analysis Landrito, Maynard R. Visualization and analysis of document clusters produced by self-organizing maps |
description |
The problem of information overload with the huge number of text documents available makes them increasingly difficult to organize and analyze. To alleviate this problem, text document clustering is used to automatically group related documents together. However, documents usually produce very high-dimensional data, making it resource-intensive to perform data processing on them. Random Projection Method (RPM) is shown to reduce the dimensionality of a large document dataset. The dimensionality reduction scheme is then coupled with Self-Organizing Maps (SOM) to organize the documents in the dataset. K-Means clustering is then performed on the SOM units to produce clusters of documents that were organized within the SOM. Various properties based on the SOM were introduced, as well as a method to measure and visualize them. These allowed for detailed analysis of the clusters and aided in nding outliers of the dataset, overlap between clusters, concentration of documents within clusters, possible subclusters and quality of di erent parts of clusters, among others. Cross-referencing between di erent property visualizations provided internal validation of the observations. For future work, the di erent SOM-based properties and their visualizations can be used for interactive document selection, recommendation systems, and quality measure. |
format |
text |
author |
Landrito, Maynard R. |
author_facet |
Landrito, Maynard R. |
author_sort |
Landrito, Maynard R. |
title |
Visualization and analysis of document clusters produced by self-organizing maps |
title_short |
Visualization and analysis of document clusters produced by self-organizing maps |
title_full |
Visualization and analysis of document clusters produced by self-organizing maps |
title_fullStr |
Visualization and analysis of document clusters produced by self-organizing maps |
title_full_unstemmed |
Visualization and analysis of document clusters produced by self-organizing maps |
title_sort |
visualization and analysis of document clusters produced by self-organizing maps |
publisher |
Animo Repository |
publishDate |
2013 |
url |
https://animorepository.dlsu.edu.ph/etd_masteral/4372 |
_version_ |
1772834453065826304 |