Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz

Huge reliance on computer usage in everyday life, leads to a continuous increase of large data applications in textual forms. The data are reposited to a secondary storage for future usage. Therefore, a relational database (RDB) is most commonly used as a backbone in most application software for or...

Full description

Saved in:
Bibliographic Details
Main Author: Yafooz, Wael Mohamed Shaher
Format: Thesis
Language:English
Published: 2014
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/28040/1/28040.pdf
https://ir.uitm.edu.my/id/eprint/28040/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Mara
Language: English
id my.uitm.ir.28040
record_format eprints
spelling my.uitm.ir.280402024-01-19T01:27:03Z https://ir.uitm.edu.my/id/eprint/28040/ Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz Yafooz, Wael Mohamed Shaher Electronic digital computers Database management Huge reliance on computer usage in everyday life, leads to a continuous increase of large data applications in textual forms. The data are reposited to a secondary storage for future usage. Therefore, a relational database (RDB) is most commonly used as a backbone in most application software for organising such data into structured form. The RDB has robust and powerful structures for managing, organising, and retrieving the data. However, the database structure can still contain large amounts of unstructured textual data. Dealing with unstructured textual data leads to three basic issues; users encounter difficulties to find useful information, inaccurate information retrieval and insufficient performance of query processing. Attempts have been made to resolve all of these issues by using several methods such as: full text searching, text indexing, a database schema management, database data model, and query-based techniques. However, the front-end approach, in the form of software applications, are still needed to organise the unstructured textual information in the RDB. This study proposes a Textual Virtual Schema Model (TVSM) as the back-end approach to reorganising textual data inside relational databases, while performing automatic semantic linking and clustering assignments. Upon storing any new unstructured textual data into a database, all words are extracted to uncover the underlying meaning of such data. Their name entities and top most frequent terms are selected for the factors used in a cluster assignment. The model is tested and evaluated by embedding it in a component-based package of a relational databases internal structure. Three experiments have been conducted on textual Reuters corpus, Classic and WAP dataset. The clustering results have been validated using the F-measure, Entropy and Purity methods of measurement and compared with two common methods, which are information extraction and textual document clustering, for example, K-means, Frequent Item-Set, Hierarchical Clustering Algorithms and Oracle Text. The results show that there are linkages between structured textual data and unstructured information, quality improvement in textual document clustering with accurate clusters and high performance of query processing. Thus, the proposed technique can increase retrieval performance and produce high accuracy textual data clusters. This model envisages a beneficial and useful approach for various domains that involve big textual data such as document clustering, topic detecting and tracking, information integration, personal data management and information retrieval. 2014 Thesis NonPeerReviewed text en https://ir.uitm.edu.my/id/eprint/28040/1/28040.pdf Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz. (2014) PhD thesis, thesis, Universiti Teknologi MARA. <http://terminalib.uitm.edu.my/28040.pdf>
institution Universiti Teknologi Mara
building Tun Abdul Razak Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Mara
content_source UiTM Institutional Repository
url_provider http://ir.uitm.edu.my/
language English
topic Electronic digital computers
Database management
spellingShingle Electronic digital computers
Database management
Yafooz, Wael Mohamed Shaher
Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
description Huge reliance on computer usage in everyday life, leads to a continuous increase of large data applications in textual forms. The data are reposited to a secondary storage for future usage. Therefore, a relational database (RDB) is most commonly used as a backbone in most application software for organising such data into structured form. The RDB has robust and powerful structures for managing, organising, and retrieving the data. However, the database structure can still contain large amounts of unstructured textual data. Dealing with unstructured textual data leads to three basic issues; users encounter difficulties to find useful information, inaccurate information retrieval and insufficient performance of query processing. Attempts have been made to resolve all of these issues by using several methods such as: full text searching, text indexing, a database schema management, database data model, and query-based techniques. However, the front-end approach, in the form of software applications, are still needed to organise the unstructured textual information in the RDB. This study proposes a Textual Virtual Schema Model (TVSM) as the back-end approach to reorganising textual data inside relational databases, while performing automatic semantic linking and clustering assignments. Upon storing any new unstructured textual data into a database, all words are extracted to uncover the underlying meaning of such data. Their name entities and top most frequent terms are selected for the factors used in a cluster assignment. The model is tested and evaluated by embedding it in a component-based package of a relational databases internal structure. Three experiments have been conducted on textual Reuters corpus, Classic and WAP dataset. The clustering results have been validated using the F-measure, Entropy and Purity methods of measurement and compared with two common methods, which are information extraction and textual document clustering, for example, K-means, Frequent Item-Set, Hierarchical Clustering Algorithms and Oracle Text. The results show that there are linkages between structured textual data and unstructured information, quality improvement in textual document clustering with accurate clusters and high performance of query processing. Thus, the proposed technique can increase retrieval performance and produce high accuracy textual data clusters. This model envisages a beneficial and useful approach for various domains that involve big textual data such as document clustering, topic detecting and tracking, information integration, personal data management and information retrieval.
format Thesis
author Yafooz, Wael Mohamed Shaher
author_facet Yafooz, Wael Mohamed Shaher
author_sort Yafooz, Wael Mohamed Shaher
title Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
title_short Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
title_full Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
title_fullStr Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
title_full_unstemmed Application of clustering in managing unstructured textual data in relational database / Wael Mohamed Shaher Yafooz
title_sort application of clustering in managing unstructured textual data in relational database / wael mohamed shaher yafooz
publishDate 2014
url https://ir.uitm.edu.my/id/eprint/28040/1/28040.pdf
https://ir.uitm.edu.my/id/eprint/28040/
_version_ 1789429164292440064