Enhanced ontology-based text classification algorithm for structurally organized documents
Text classification (TC) is an important foundation of information retrieval and text mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among th...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English |
Published: |
2015
|
Subjects: | |
Online Access: | http://etd.uum.edu.my/5358/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Utara Malaysia |
Language: | English English |
id |
my.uum.etd.5358 |
---|---|
record_format |
eprints |
spelling |
my.uum.etd.53582021-03-18T08:38:01Z http://etd.uum.edu.my/5358/ Enhanced ontology-based text classification algorithm for structurally organized documents Oleiwi, Suha Sahib QA Mathematics QA76.76 Fuzzy System. Text classification (TC) is an important foundation of information retrieval and text mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among the terms. These algorithms represent documents in a space where every word is assumed to be a dimension. As a result such representations generate high dimensionality which gives a negative effect on the classification performance. The objectives of this thesis are to formulate algorithms for classifying text by creating suitable feature vector and reducing the dimension of data which will enhance the classification accuracy. This research combines the ontology and text representation for classification by developing five algorithms. The first and second algorithms namely Concept Feature Vector (CFV) and Structure Feature Vector (SFV), create feature vector to represent the document. The third algorithm is the Ontology Based Text Classification (OBTC) and is designed to reduce the dimensionality of training sets. The fourth and fifth algorithms, Concept Feature Vector_Text Classification (CFV_TC) and Structure Feature Vector_Text Classification (SFV_TC) classify the document to its related set of classes. These proposed algorithms were tested on five different scientific paper datasets downloaded from different digital libraries and repositories. Experimental obtained from the proposed algorithm, CFV_TC and SFV_TC shown better average results in terms of precision, recall, f-measure and accuracy compared against SVM and RSS approaches. The work in this study contributes to exploring the related document in information retrieval and text mining research by using ontology in TC. 2015 Thesis NonPeerReviewed text en /5358/1/s91731.pdf text en /5358/2/s91731_abstract.pdf Oleiwi, Suha Sahib (2015) Enhanced ontology-based text classification algorithm for structurally organized documents. PhD. thesis, Universiti Utara Malaysia. |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Electronic Theses |
url_provider |
http://etd.uum.edu.my/ |
language |
English English |
topic |
QA Mathematics QA76.76 Fuzzy System. |
spellingShingle |
QA Mathematics QA76.76 Fuzzy System. Oleiwi, Suha Sahib Enhanced ontology-based text classification algorithm for structurally organized documents |
description |
Text classification (TC) is an important foundation of information retrieval and text
mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among the terms. These algorithms represent documents in a space where every word is assumed to be a dimension. As a result such representations generate high dimensionality which gives a negative effect on
the classification performance. The objectives of this thesis are to formulate algorithms for classifying text by creating suitable feature vector and reducing the dimension of data which will enhance the classification accuracy. This research combines the ontology and text representation for classification by developing five algorithms. The first and second algorithms namely Concept Feature Vector (CFV)
and Structure Feature Vector (SFV), create feature vector to represent the document.
The third algorithm is the Ontology Based Text Classification (OBTC) and is designed to reduce the dimensionality of training sets. The fourth and fifth algorithms, Concept Feature Vector_Text Classification (CFV_TC) and Structure Feature Vector_Text Classification (SFV_TC) classify the document to its related
set of classes. These proposed algorithms were tested on five different scientific paper datasets downloaded from different digital libraries and repositories. Experimental obtained from the proposed algorithm, CFV_TC and SFV_TC shown better average results in terms of precision, recall, f-measure and accuracy compared against SVM and RSS approaches. The work in this study contributes to exploring the related document in information retrieval and text mining research by using ontology in TC. |
format |
Thesis |
author |
Oleiwi, Suha Sahib |
author_facet |
Oleiwi, Suha Sahib |
author_sort |
Oleiwi, Suha Sahib |
title |
Enhanced ontology-based text classification algorithm for structurally organized documents |
title_short |
Enhanced ontology-based text classification algorithm for structurally organized documents |
title_full |
Enhanced ontology-based text classification algorithm for structurally organized documents |
title_fullStr |
Enhanced ontology-based text classification algorithm for structurally organized documents |
title_full_unstemmed |
Enhanced ontology-based text classification algorithm for structurally organized documents |
title_sort |
enhanced ontology-based text classification algorithm for structurally organized documents |
publishDate |
2015 |
url |
http://etd.uum.edu.my/5358/ |
_version_ |
1695533671769964544 |