Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik

Anaphora resolution (AR) is a process to resolve reference entity of pronoun anaphora. It is a phenomenon that occur in every languages and requires human experts or specific rules in order to resolve it. AR able to improve language processing applications such as question-answering, text mining, do...

Full description

Saved in:
Bibliographic Details
Main Author: Noorhuzaimi@Karimah, Mohd Noor
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/25341/1/Resolusi%20anafora%20artikel%20Bahasa%20Melayu%20berasaskan%20pengetahuan%20terhad.pdf
http://umpir.ump.edu.my/id/eprint/25341/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang
Language: English
id my.ump.umpir.25341
record_format eprints
spelling my.ump.umpir.253412021-07-28T03:18:07Z http://umpir.ump.edu.my/id/eprint/25341/ Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik Noorhuzaimi@Karimah, Mohd Noor PL Languages and literatures of Eastern Asia, Africa, Oceania Anaphora resolution (AR) is a process to resolve reference entity of pronoun anaphora. It is a phenomenon that occur in every languages and requires human experts or specific rules in order to resolve it. AR able to improve language processing applications such as question-answering, text mining, document summarizations, and information extraction. There has been various research carried out on AR, but the majority of them were meant for languages such as English, Japanese and Norwegian. Very few and almost no research effort have been focussed on AR for Malay language. Therefore, the aim of this research is to resolve the phenomena of AR for Malay text by using knowledge poor approach and semantic class labelling model. In order to achieve the aim, a framework of the Malay AR has been developed as a guide to solve this phenomenon in Malay language. Meanwhile, the process to determine the type of usage for pronoun nya has been solved by using a set of rules, a set of similar words, and word filtering that has been generate from semantic class labelling model. This process is important because the use of pronoun nya in Malay text is the highest, amounting to 68% as compared to other pronouns that mostly depend on the sociological status of referring entity or antecedent. The antecedent candidate determination is an important process that should be considered. The antecedent candidates can be in the form of proper noun or nouns. In order to determine proper nouns as suitable candidates, two main processes need to be done: (1) the entity recognition for proper noun that has the word 'dan' and comma symbol (,); and (2) the process to determine the semantic label for each retrieved candidate in order to determine their sociological status. The research used part of the name gazetteers for people, organization, location and position. Testing has been conducted on 60 Malay articles with different classes of proper nouns. The results were compared with the benchmark data tagged by a Malay linguist. The result shows an average precision and recall values of 85% and 90% respectively. The proposed framework of AR by using knowledge poor approach for Malay text shows increased success rate by 18.79% as compared to the generic approach proposed by Mitkov and Lappin. 2016 Thesis NonPeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/25341/1/Resolusi%20anafora%20artikel%20Bahasa%20Melayu%20berasaskan%20pengetahuan%20terhad.pdf Noorhuzaimi@Karimah, Mohd Noor (2016) Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik. PhD thesis, Universiti Kebangsaan Malaysia.
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic PL Languages and literatures of Eastern Asia, Africa, Oceania
spellingShingle PL Languages and literatures of Eastern Asia, Africa, Oceania
Noorhuzaimi@Karimah, Mohd Noor
Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
description Anaphora resolution (AR) is a process to resolve reference entity of pronoun anaphora. It is a phenomenon that occur in every languages and requires human experts or specific rules in order to resolve it. AR able to improve language processing applications such as question-answering, text mining, document summarizations, and information extraction. There has been various research carried out on AR, but the majority of them were meant for languages such as English, Japanese and Norwegian. Very few and almost no research effort have been focussed on AR for Malay language. Therefore, the aim of this research is to resolve the phenomena of AR for Malay text by using knowledge poor approach and semantic class labelling model. In order to achieve the aim, a framework of the Malay AR has been developed as a guide to solve this phenomenon in Malay language. Meanwhile, the process to determine the type of usage for pronoun nya has been solved by using a set of rules, a set of similar words, and word filtering that has been generate from semantic class labelling model. This process is important because the use of pronoun nya in Malay text is the highest, amounting to 68% as compared to other pronouns that mostly depend on the sociological status of referring entity or antecedent. The antecedent candidate determination is an important process that should be considered. The antecedent candidates can be in the form of proper noun or nouns. In order to determine proper nouns as suitable candidates, two main processes need to be done: (1) the entity recognition for proper noun that has the word 'dan' and comma symbol (,); and (2) the process to determine the semantic label for each retrieved candidate in order to determine their sociological status. The research used part of the name gazetteers for people, organization, location and position. Testing has been conducted on 60 Malay articles with different classes of proper nouns. The results were compared with the benchmark data tagged by a Malay linguist. The result shows an average precision and recall values of 85% and 90% respectively. The proposed framework of AR by using knowledge poor approach for Malay text shows increased success rate by 18.79% as compared to the generic approach proposed by Mitkov and Lappin.
format Thesis
author Noorhuzaimi@Karimah, Mohd Noor
author_facet Noorhuzaimi@Karimah, Mohd Noor
author_sort Noorhuzaimi@Karimah, Mohd Noor
title Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
title_short Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
title_full Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
title_fullStr Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
title_full_unstemmed Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik
title_sort resolusi anafora artikel bahasa melayu berasaskan pengetahuan terhad dan kelas semantik
publishDate 2016
url http://umpir.ump.edu.my/id/eprint/25341/1/Resolusi%20anafora%20artikel%20Bahasa%20Melayu%20berasaskan%20pengetahuan%20terhad.pdf
http://umpir.ump.edu.my/id/eprint/25341/
_version_ 1706957243648311296