An ontological framework for information extraction using fuzzy rule-based system and word sense disambiguation

Automatic information extraction (IE) from online published scientific resources (mainly semi-structured and unstructured) like articles, proceedings, editorials etc. is among the hottest areas of research in text mining. This information is essential for various reasons like tagging, searching, ind...

Full description

Saved in:

Bibliographic Details
Main Author:	Zaman, Gohar
Format:	Thesis
Language:	English English English
Published:	2021
Subjects:	T Technology (General)
Online Access:	http://eprints.uthm.edu.my/8418/1/24p%20GOHAR%20ZAMAN.pdf http://eprints.uthm.edu.my/8418/2/GOHAR%20ZAMAN%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/8418/3/GOHAR%20ZAMAN%20WATERMARK.pdf http://eprints.uthm.edu.my/8418/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Tun Hussein Onn Malaysia
Language:	English English English

id	my.uthm.eprints.8418
record_format	eprints
spelling	my.uthm.eprints.84182023-02-26T07:17:55Z http://eprints.uthm.edu.my/8418/ An ontological framework for information extraction using fuzzy rule-based system and word sense disambiguation Zaman, Gohar T Technology (General) Automatic information extraction (IE) from online published scientific resources (mainly semi-structured and unstructured) like articles, proceedings, editorials etc. is among the hottest areas of research in text mining. This information is essential for various reasons like tagging, searching, indexing the documents and search engine optimization. In this regard, various techniques possessing considerable accuracy besides other merits, have been proposed in the literature. However, their efficiency is limited to domain-specific documents with static and well-defined formats. Whereas the accuracy is significantly challenged with a slight modification in the document format. Hence, it can be safely stated that so far, no scheme is robust enough for broader types, domains, and formats of documents from diverse publishing societies. To address this issue, an Ontological Framework for IE (OFIE) using a fuzzy rule-based system (FRBS) and an efficient word sense disambiguation (WSD) technique is proposed in this research. The FRBS module is responsible for IE in a precise manner by incorporating fuzzy regular expressions with an added tolerance factor conceived experimentally. FRBS is applied to XML and text converted versions of the same input document to extract two streams. Afterwards, the WSD module synthesizes both streams and yields the outcome that is promising semantically as well as syntactically. The domain is significantly wide-ranging and comprises of articles from well-known publishing services like IEEE, ACM, Elsevier, Springer, and few others. It is observed from extensive experiments and contrasting with state-of-the-art techniques that the proposed scheme is robust to changes in format, extracts better information, and exhibits a significant precision, recall and F-score as 89.14%, 89.6% and 89%, respectively in testing phase. As an outcome, the extracted information can be stored in a digital library for the sake of archiving and retrieval by means of extract, transform and load (ETL) process. 2021-12 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/8418/1/24p%20GOHAR%20ZAMAN.pdf text en http://eprints.uthm.edu.my/8418/2/GOHAR%20ZAMAN%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/8418/3/GOHAR%20ZAMAN%20WATERMARK.pdf Zaman, Gohar (2021) An ontological framework for information extraction using fuzzy rule-based system and word sense disambiguation. Doctoral thesis, Universiti Tun Hussein Onn Malaysia.
institution	Universiti Tun Hussein Onn Malaysia
building	UTHM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Tun Hussein Onn Malaysia
content_source	UTHM Institutional Repository
url_provider	http://eprints.uthm.edu.my/
language	English English English
topic	T Technology (General)
spellingShingle	T Technology (General) Zaman, Gohar An ontological framework for information extraction using fuzzy rule-based system and word sense disambiguation
description	Automatic information extraction (IE) from online published scientific resources (mainly semi-structured and unstructured) like articles, proceedings, editorials etc. is among the hottest areas of research in text mining. This information is essential for various reasons like tagging, searching, indexing the documents and search engine optimization. In this regard, various techniques possessing considerable accuracy besides other merits, have been proposed in the literature. However, their efficiency is limited to domain-specific documents with static and well-defined formats. Whereas the accuracy is significantly challenged with a slight modification in the document format. Hence, it can be safely stated that so far, no scheme is robust enough for broader types, domains, and formats of documents from diverse publishing societies. To address this issue, an Ontological Framework for IE (OFIE) using a fuzzy rule-based system (FRBS) and an efficient word sense disambiguation (WSD) technique is proposed in this research. The FRBS module is responsible for IE in a precise manner by incorporating fuzzy regular expressions with an added tolerance factor conceived experimentally. FRBS is applied to XML and text converted versions of the same input document to extract two streams. Afterwards, the WSD module synthesizes both streams and yields the outcome that is promising semantically as well as syntactically. The domain is significantly wide-ranging and comprises of articles from well-known publishing services like IEEE, ACM, Elsevier, Springer, and few others. It is observed from extensive experiments and contrasting with state-of-the-art techniques that the proposed scheme is robust to changes in format, extracts better information, and exhibits a significant precision, recall and F-score as 89.14%, 89.6% and 89%, respectively in testing phase. As an outcome, the extracted information can be stored in a digital library for the sake of archiving and retrieval by means of extract, transform and load (ETL) process.
format	Thesis
author	Zaman, Gohar
author_facet	Zaman, Gohar
author_sort	Zaman, Gohar
title	An ontological framework for information extraction using fuzzy rule-based system and word sense disambiguation
title_short	An ontological framework for information extraction using fuzzy rule-based system and word sense disambiguation
title_full	An ontological framework for information extraction using fuzzy rule-based system and word sense disambiguation
title_fullStr	An ontological framework for information extraction using fuzzy rule-based system and word sense disambiguation
title_full_unstemmed	An ontological framework for information extraction using fuzzy rule-based system and word sense disambiguation
title_sort	ontological framework for information extraction using fuzzy rule-based system and word sense disambiguation
publishDate	2021
url	http://eprints.uthm.edu.my/8418/1/24p%20GOHAR%20ZAMAN.pdf http://eprints.uthm.edu.my/8418/2/GOHAR%20ZAMAN%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/8418/3/GOHAR%20ZAMAN%20WATERMARK.pdf http://eprints.uthm.edu.my/8418/
_version_	1758952405222817792

An ontological framework for information extraction using fuzzy rule-based system and word sense disambiguation

Similar Items