Enhanced ontology-based text classification algorithm for structurally organized documents

Text classification (TC) is an important foundation of information retrieval and text mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among th...

Full description

Saved in:
Bibliographic Details
Main Author: Oleiwi, Suha Sahib
Format: Thesis
Language:English
English
Published: 2015
Subjects:
Online Access:http://etd.uum.edu.my/5358/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Utara Malaysia
Language: English
English
id my.uum.etd.5358
record_format eprints
spelling my.uum.etd.53582021-03-18T08:38:01Z http://etd.uum.edu.my/5358/ Enhanced ontology-based text classification algorithm for structurally organized documents Oleiwi, Suha Sahib QA Mathematics QA76.76 Fuzzy System. Text classification (TC) is an important foundation of information retrieval and text mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among the terms. These algorithms represent documents in a space where every word is assumed to be a dimension. As a result such representations generate high dimensionality which gives a negative effect on the classification performance. The objectives of this thesis are to formulate algorithms for classifying text by creating suitable feature vector and reducing the dimension of data which will enhance the classification accuracy. This research combines the ontology and text representation for classification by developing five algorithms. The first and second algorithms namely Concept Feature Vector (CFV) and Structure Feature Vector (SFV), create feature vector to represent the document. The third algorithm is the Ontology Based Text Classification (OBTC) and is designed to reduce the dimensionality of training sets. The fourth and fifth algorithms, Concept Feature Vector_Text Classification (CFV_TC) and Structure Feature Vector_Text Classification (SFV_TC) classify the document to its related set of classes. These proposed algorithms were tested on five different scientific paper datasets downloaded from different digital libraries and repositories. Experimental obtained from the proposed algorithm, CFV_TC and SFV_TC shown better average results in terms of precision, recall, f-measure and accuracy compared against SVM and RSS approaches. The work in this study contributes to exploring the related document in information retrieval and text mining research by using ontology in TC. 2015 Thesis NonPeerReviewed text en /5358/1/s91731.pdf text en /5358/2/s91731_abstract.pdf Oleiwi, Suha Sahib (2015) Enhanced ontology-based text classification algorithm for structurally organized documents. PhD. thesis, Universiti Utara Malaysia.
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
url_provider http://etd.uum.edu.my/
language English
English
topic QA Mathematics
QA76.76 Fuzzy System.
spellingShingle QA Mathematics
QA76.76 Fuzzy System.
Oleiwi, Suha Sahib
Enhanced ontology-based text classification algorithm for structurally organized documents
description Text classification (TC) is an important foundation of information retrieval and text mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among the terms. These algorithms represent documents in a space where every word is assumed to be a dimension. As a result such representations generate high dimensionality which gives a negative effect on the classification performance. The objectives of this thesis are to formulate algorithms for classifying text by creating suitable feature vector and reducing the dimension of data which will enhance the classification accuracy. This research combines the ontology and text representation for classification by developing five algorithms. The first and second algorithms namely Concept Feature Vector (CFV) and Structure Feature Vector (SFV), create feature vector to represent the document. The third algorithm is the Ontology Based Text Classification (OBTC) and is designed to reduce the dimensionality of training sets. The fourth and fifth algorithms, Concept Feature Vector_Text Classification (CFV_TC) and Structure Feature Vector_Text Classification (SFV_TC) classify the document to its related set of classes. These proposed algorithms were tested on five different scientific paper datasets downloaded from different digital libraries and repositories. Experimental obtained from the proposed algorithm, CFV_TC and SFV_TC shown better average results in terms of precision, recall, f-measure and accuracy compared against SVM and RSS approaches. The work in this study contributes to exploring the related document in information retrieval and text mining research by using ontology in TC.
format Thesis
author Oleiwi, Suha Sahib
author_facet Oleiwi, Suha Sahib
author_sort Oleiwi, Suha Sahib
title Enhanced ontology-based text classification algorithm for structurally organized documents
title_short Enhanced ontology-based text classification algorithm for structurally organized documents
title_full Enhanced ontology-based text classification algorithm for structurally organized documents
title_fullStr Enhanced ontology-based text classification algorithm for structurally organized documents
title_full_unstemmed Enhanced ontology-based text classification algorithm for structurally organized documents
title_sort enhanced ontology-based text classification algorithm for structurally organized documents
publishDate 2015
url http://etd.uum.edu.my/5358/
_version_ 1695533671769964544