Keyword extraction for very high dimensional datasets using random projection as key input representation scheme

Keywords are increasingly useful as users are faced with the challenge of keeping up with voluminous information that they need to process every day. The most straightforward way for extracting keywords is to compute for the term frequencies for each document. But when dealing with corpora containin...

Full description

Saved in:
Bibliographic Details
Main Author: Dy, Jeric Bryle S.
Format: text
Language:English
Published: Animo Repository 2011
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/6649
https://animorepository.dlsu.edu.ph/context/etd_masteral/article/12928/viewcontent/CDTG004899_P.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_masteral-12928
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_masteral-129282023-03-10T00:38:04Z Keyword extraction for very high dimensional datasets using random projection as key input representation scheme Dy, Jeric Bryle S. Keywords are increasingly useful as users are faced with the challenge of keeping up with voluminous information that they need to process every day. The most straightforward way for extracting keywords is to compute for the term frequencies for each document. But when dealing with corpora containing hundreds of thousands of unique terms, the huge amount of space needed and the enormous amount of computing time required to eventually extract the most relevant terms as keywords would severely limit the practical implementation of current keyword extraction techniques. As such, the frequency counts of extracted terms need to be subjected to a data compression scheme. In this research, the random projection method is used to compress the extracted data and the method allows for various clustering and keyword extraction algorithms to be done directly on the compressed data. Several experiments are conducted to assess the effect of the random projection method on the quality and time-space efficiency of the k-means clustering and term extraction. 2011-02-01T08:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etd_masteral/6649 https://animorepository.dlsu.edu.ph/context/etd_masteral/article/12928/viewcontent/CDTG004899_P.pdf Master's Theses English Animo Repository Text processing (Computer science) Dimension reduction (Statistics) Document clustering Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Text processing (Computer science)
Dimension reduction (Statistics)
Document clustering
Computer Sciences
spellingShingle Text processing (Computer science)
Dimension reduction (Statistics)
Document clustering
Computer Sciences
Dy, Jeric Bryle S.
Keyword extraction for very high dimensional datasets using random projection as key input representation scheme
description Keywords are increasingly useful as users are faced with the challenge of keeping up with voluminous information that they need to process every day. The most straightforward way for extracting keywords is to compute for the term frequencies for each document. But when dealing with corpora containing hundreds of thousands of unique terms, the huge amount of space needed and the enormous amount of computing time required to eventually extract the most relevant terms as keywords would severely limit the practical implementation of current keyword extraction techniques. As such, the frequency counts of extracted terms need to be subjected to a data compression scheme. In this research, the random projection method is used to compress the extracted data and the method allows for various clustering and keyword extraction algorithms to be done directly on the compressed data. Several experiments are conducted to assess the effect of the random projection method on the quality and time-space efficiency of the k-means clustering and term extraction.
format text
author Dy, Jeric Bryle S.
author_facet Dy, Jeric Bryle S.
author_sort Dy, Jeric Bryle S.
title Keyword extraction for very high dimensional datasets using random projection as key input representation scheme
title_short Keyword extraction for very high dimensional datasets using random projection as key input representation scheme
title_full Keyword extraction for very high dimensional datasets using random projection as key input representation scheme
title_fullStr Keyword extraction for very high dimensional datasets using random projection as key input representation scheme
title_full_unstemmed Keyword extraction for very high dimensional datasets using random projection as key input representation scheme
title_sort keyword extraction for very high dimensional datasets using random projection as key input representation scheme
publisher Animo Repository
publishDate 2011
url https://animorepository.dlsu.edu.ph/etd_masteral/6649
https://animorepository.dlsu.edu.ph/context/etd_masteral/article/12928/viewcontent/CDTG004899_P.pdf
_version_ 1767196298452140032