A Tagalog morphological analyzer using example-based approach

Example-based MA approaches learn a languages morphology from a set of examples. Researches in this area have been developed to address the time consuming and costly development of rule-based MAs. But most researches in this area are centered on concatenative morphology and little work has been done...

Full description

Saved in:
Bibliographic Details
Main Author: See, Solomon Lim
Format: text
Language:English
Published: Animo Repository 2006
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/3395
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:Example-based MA approaches learn a languages morphology from a set of examples. Researches in this area have been developed to address the time consuming and costly development of rule-based MAs. But most researches in this area are centered on concatenative morphology and little work has been done for non-concatenative morphology due to its complexity. Tagalog is an example of a language that exhibits non-concatenative morphology. Some works on example-based MA that has been able to handle such morphologies incorrectly models the morphological phenomena of infixation and reduplication. An example-based MA that learns string rewrite rules from a word pair was developed to handle the different morphological phenomena in Tagalog, namely prefixation, infixation, suffixation, cirumfixation, internal vowel changes, and partial and whole word reduplication, and its morphotactic rules. The model was evaluated against a Filipino lexicon because the language is composed mainly Tagalog words, adapts Tagalog morphology and is a language commonly used in the Philippines. The model was tested using ten-fold cross validation with 40,272 word pairs. The model developed performs better with words exhibiting infixation and reduplication and has an accuracy of 90% for both derivational and inflectional morphology from an original performance of 88% using the original model. The analysis time on the other hand increased from 11 minutes using the original model to 35 minutes using the developed model. The developed model can be used to discover affixes and its associated morphological categories for other languages that exhibit the same morphological phenomena. The current limitation of the model is that it is unable to properly model agglutination and a solution considering syllabication and phonology for word alignment is recommended to further improve its performance.