Neural network-based keyword extraction using word frequency, position, usage and format features

Keywords have become integral to many Knowledge Management Systems, Information Retrieval Systems, and Digital Libraries. These have also become significant in commerce, specifically in providing contextual advertisements to online content. Not all text information, however, are annotated with good...

Full description

Saved in:
Bibliographic Details
Main Author: Tensuan, Juan Paolo
Format: text
Language:English
Published: Animo Repository 2013
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/4386
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_masteral-11224
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_masteral-112242021-01-18T08:00:20Z Neural network-based keyword extraction using word frequency, position, usage and format features Tensuan, Juan Paolo Keywords have become integral to many Knowledge Management Systems, Information Retrieval Systems, and Digital Libraries. These have also become significant in commerce, specifically in providing contextual advertisements to online content. Not all text information, however, are annotated with good key- words. Hence, it is important to be able to automatically extract keywords from documents. In this study, the researcher investigates on the use of the Backpropagation Neural Network algorithm for keyword extraction from documents. The feasibility of using statistical features such as word frequency, positioning, and usage was further validated along with additional word formatting features. Rule extraction was done to be able to examine the relative importance of these statistical features for keyword extraction. Two corpora were used for experimentation: one comprised of IEEE journal papers and the other comprised of Wikipedia articles. With the exclusion of the TF-IDF feature, addition of word format features, and post-calibration of the Backpropagation Neural Networks, the models produced were able to achieve G-Means of 0.75 and 0.77 for the IEEE journal papers and Wikipedia articles respectively. Finally, analysis of results also showed that word formatting features were of much more importance to keyword extraction for Wikipedia articles than for IEEE journal papers, confirming the researcher's initial hypothesis that the varying writing styles would affect the importance of these features. 2013-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/4386 Master's Theses English Animo Repository Keyword searching Back propagation (Artificial intelligence)
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Keyword searching
Back propagation (Artificial intelligence)
spellingShingle Keyword searching
Back propagation (Artificial intelligence)
Tensuan, Juan Paolo
Neural network-based keyword extraction using word frequency, position, usage and format features
description Keywords have become integral to many Knowledge Management Systems, Information Retrieval Systems, and Digital Libraries. These have also become significant in commerce, specifically in providing contextual advertisements to online content. Not all text information, however, are annotated with good key- words. Hence, it is important to be able to automatically extract keywords from documents. In this study, the researcher investigates on the use of the Backpropagation Neural Network algorithm for keyword extraction from documents. The feasibility of using statistical features such as word frequency, positioning, and usage was further validated along with additional word formatting features. Rule extraction was done to be able to examine the relative importance of these statistical features for keyword extraction. Two corpora were used for experimentation: one comprised of IEEE journal papers and the other comprised of Wikipedia articles. With the exclusion of the TF-IDF feature, addition of word format features, and post-calibration of the Backpropagation Neural Networks, the models produced were able to achieve G-Means of 0.75 and 0.77 for the IEEE journal papers and Wikipedia articles respectively. Finally, analysis of results also showed that word formatting features were of much more importance to keyword extraction for Wikipedia articles than for IEEE journal papers, confirming the researcher's initial hypothesis that the varying writing styles would affect the importance of these features.
format text
author Tensuan, Juan Paolo
author_facet Tensuan, Juan Paolo
author_sort Tensuan, Juan Paolo
title Neural network-based keyword extraction using word frequency, position, usage and format features
title_short Neural network-based keyword extraction using word frequency, position, usage and format features
title_full Neural network-based keyword extraction using word frequency, position, usage and format features
title_fullStr Neural network-based keyword extraction using word frequency, position, usage and format features
title_full_unstemmed Neural network-based keyword extraction using word frequency, position, usage and format features
title_sort neural network-based keyword extraction using word frequency, position, usage and format features
publisher Animo Repository
publishDate 2013
url https://animorepository.dlsu.edu.ph/etd_masteral/4386
_version_ 1772834483683196928