Neural network-based keyword extraction using word frequency, position, usage and format features
Keywords have become integral to many Knowledge Management Systems, Information Retrieval Systems, and Digital Libraries. These have also become significant in commerce, specifically in providing contextual advertisements to online content. Not all text information, however, are annotated with good...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2013
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etd_masteral/4386 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
Summary: | Keywords have become integral to many Knowledge Management Systems, Information Retrieval Systems, and Digital Libraries. These have also become significant in commerce, specifically in providing contextual advertisements to online content. Not all text information, however, are annotated with good key- words. Hence, it is important to be able to automatically extract keywords from documents. In this study, the researcher investigates on the use of the Backpropagation Neural Network algorithm for keyword extraction from documents. The feasibility of using statistical features such as word frequency, positioning, and usage was further validated along with additional word formatting features. Rule extraction was done to be able to examine the relative importance of these statistical features for keyword extraction. Two corpora were used for experimentation: one comprised of IEEE journal papers and the other comprised of Wikipedia articles. With the exclusion of the TF-IDF feature, addition of word format features, and post-calibration of the Backpropagation Neural Networks, the models produced were able to achieve G-Means of 0.75 and 0.77 for the IEEE journal papers and Wikipedia articles respectively. Finally, analysis of results also showed that word formatting features were of much more importance to keyword extraction for Wikipedia articles than for IEEE journal papers, confirming the researcher's initial hypothesis that the varying writing styles would affect the importance of these features. |
---|