Named entity recognition for quranic text using rule based approaches
The variety and difference between domains for textual data require customization in the Natural Language Processing component especially in Named Entity Recognition where different domains contain several types of entities. The current NER model is deemed not fit to accurately extract entities...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Penerbit Universiti Kebangsaan Malaysia
2022
|
Online Access: | http://journalarticle.ukm.my/20852/1/9.pdf http://journalarticle.ukm.my/20852/ https://www.ukm.my/apjitm/articles-issues |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Kebangsaan Malaysia |
Language: | English |
Summary: | The variety and difference between domains for textual data require customization in the Natural Language
Processing component especially in Named Entity Recognition where different domains contain several types of
entities. The current NER model is deemed not fit to accurately extract entities from Quranic text due to its unique
content. This paper describes the building of a rule-based Named Entity Recognition method to extract the entities
that exist in the English translation to the meaning of the Quranic text and its performance evaluation. Named
entity tagging, a common task in-text annotation, in which entities (nouns) in the unstructured text are identified
and assigned a class. A few rules are built to extract several types of entities such as the name of prophets and
people, creation, location, time, and the various names of God. The rules are built mainly using regular expressions
and gazetteers. The rules that have been built result in high precision and recall as well as a satisfactory F-score
of over 90%. The results from this experiment can be used as annotation in building a machine learning model to
extract entities from the same type of domain specifically on the Quranic text or generally in the Islamic domain
text. |
---|