Word-streams for representing context in word maps

The most prominent use of Self-Organizing Maps (SOMs) in text archiving and retrieval is the WEBSOM. In WEBSOM, a map is first used to reduce the dimensionality of the huge term frequency table by training a so-called word-category map. This wordcategory map is then used to convert the individual do...

Full description

Saved in:
Bibliographic Details
Main Authors: Azcarraga, Arnulfo P., Gopez, Alfred Kenneth S., Yap, Teddy, Jr.
Format: text
Published: Animo Repository 2007
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/faculty_research/11960
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Description
Summary:The most prominent use of Self-Organizing Maps (SOMs) in text archiving and retrieval is the WEBSOM. In WEBSOM, a map is first used to reduce the dimensionality of the huge term frequency table by training a so-called word-category map. This wordcategory map is then used to convert the individual documents into their respective document signatures (i.e. histogram of words) which form the basis for training a document map. This document map is the final text archive. WEBSOM has been shown to be a powerful and versatile text archiving system. However, it spends (wastes) enormous computer resources in the computation of the left and right context of each and every word that appears in any of the documents in the text corpus. This paper presents an alternative scheme for incorporating context in the encoding of the words in such a way that the computation of the probabilistic centroid, which is inherent in the SOM training algorithm, is taken full advantage of. Several experiments are conducted to compare this new scheme with WEBSOM’s context averaging scheme.