Tagging documents using neural networks based on local word features

Keywords and key-phrases that concisely represent text documents are integral to many knowledge management and text information retrieval systems, as well as digital libraries in general. Not all text documents, however, are annotated with good keywords; and the quality of these keywords is often de...

Full description

Saved in:

Bibliographic Details
Main Authors:	Azcarraga, Amulfo P., Tensuan, Paolo, Setiono, Rudy
Format:	text
Published:	Animo Repository 2014
Subjects:	Automatic indexing Text processing (Computer science) Neural networks (Computer science) Software Engineering
Online Access:	https://animorepository.dlsu.edu.ph/faculty_research/1910
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	De La Salle University

id	oai:animorepository.dlsu.edu.ph:faculty_research-2909
record_format	eprints
spelling	oai:animorepository.dlsu.edu.ph:faculty_research-29092021-07-30T03:18:40Z Tagging documents using neural networks based on local word features Azcarraga, Amulfo P. Tensuan, Paolo Setiono, Rudy Keywords and key-phrases that concisely represent text documents are integral to many knowledge management and text information retrieval systems, as well as digital libraries in general. Not all text documents, however, are annotated with good keywords; and the quality of these keywords is often dependent on a tedious, sometimes manual, extraction and tagging process. To automatically extract high quality keywords without the need for a semantic analysis of the document, it is shown that artificial neural networks (ANN) can be trained to only consider in-document word features such as word frequency, word distribution in document, use of word in special parts of the document, and use of word formatting features (i.e. bold-faced, italicized, large-font size). Results show that purely local features are adequate in determining whether a word in a document is a keyword or not. Classification performance yields a G mean of a least 0.83, and weighted f-measure of 0.96 for both keywords and non-keywords. Precision for keywords alone, however, is not as high. To understand the basis for classifying keywords, C4.5 is used to extract rules from the ANN. The extracted rules from C4.5, in the form of a decision tree, show the relative importance of the different document features that were extracted. © 2014 IEEE. 2014-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/faculty_research/1910 Faculty Research Work Animo Repository Automatic indexing Text processing (Computer science) Neural networks (Computer science) Software Engineering
institution	De La Salle University
building	De La Salle University Library
continent	Asia
country	Philippines Philippines
content_provider	De La Salle University Library
collection	DLSU Institutional Repository
topic	Automatic indexing Text processing (Computer science) Neural networks (Computer science) Software Engineering
spellingShingle	Automatic indexing Text processing (Computer science) Neural networks (Computer science) Software Engineering Azcarraga, Amulfo P. Tensuan, Paolo Setiono, Rudy Tagging documents using neural networks based on local word features
description	Keywords and key-phrases that concisely represent text documents are integral to many knowledge management and text information retrieval systems, as well as digital libraries in general. Not all text documents, however, are annotated with good keywords; and the quality of these keywords is often dependent on a tedious, sometimes manual, extraction and tagging process. To automatically extract high quality keywords without the need for a semantic analysis of the document, it is shown that artificial neural networks (ANN) can be trained to only consider in-document word features such as word frequency, word distribution in document, use of word in special parts of the document, and use of word formatting features (i.e. bold-faced, italicized, large-font size). Results show that purely local features are adequate in determining whether a word in a document is a keyword or not. Classification performance yields a G mean of a least 0.83, and weighted f-measure of 0.96 for both keywords and non-keywords. Precision for keywords alone, however, is not as high. To understand the basis for classifying keywords, C4.5 is used to extract rules from the ANN. The extracted rules from C4.5, in the form of a decision tree, show the relative importance of the different document features that were extracted. © 2014 IEEE.
format	text
author	Azcarraga, Amulfo P. Tensuan, Paolo Setiono, Rudy
author_facet	Azcarraga, Amulfo P. Tensuan, Paolo Setiono, Rudy
author_sort	Azcarraga, Amulfo P.
title	Tagging documents using neural networks based on local word features
title_short	Tagging documents using neural networks based on local word features
title_full	Tagging documents using neural networks based on local word features
title_fullStr	Tagging documents using neural networks based on local word features
title_full_unstemmed	Tagging documents using neural networks based on local word features
title_sort	tagging documents using neural networks based on local word features
publisher	Animo Repository
publishDate	2014
url	https://animorepository.dlsu.edu.ph/faculty_research/1910
_version_	1707059172159258624

Tagging documents using neural networks based on local word features

Similar Items