Sentence-level morphological and phonological analyzer for Filipino (filSPAM)
Morphological analysis is an important process in natural language processing. It deals with the identification of a root word and its affixes (morphemes) from a morphed word. Phonology is another facet of morphology that has to do with how a word is voiced or sounded out. There are various approach...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2011
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etd_bachelors/11167 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
Summary: | Morphological analysis is an important process in natural language processing. It deals with the identification of a root word and its affixes (morphemes) from a morphed word. Phonology is another facet of morphology that has to do with how a word is voiced or sounded out. There are various approaches and systems that exist and are used in morphological analysis for generating rules for different languages such as MACTag. These differ in each of their methods in identification and classification of morphemes as well as handling ambiguity. Although there are systems which handle morphology for Filipino, most of these are limited in that they are only word-level and they do not cover rules for phonology. Part-of-Speech tagging is an integrated part in sentence analysis that is concerned with annotating the part-of-speech of a particular word in a sentence. There are existing tools for part-of-speech tagging such as HATPOST. These components, namely the morphological analyzer and part-of-speech tagger, function independently from one another. However, they have their own individual limitations that need to be addressed. The research constructs a sentence-level morphological and phonological analyzer for the Filipino language that integrate the aforementioned components in order to identify the part-of-speech of a Filipino word in the sentence and generate the root word and phonology of the identified words. filSPAM (Sentence-level Phonological and Morphological Analyzer for Filipino) analyzes a given Filipino sentence input and generate the corresponding part-of-speech, root word, and phonology of this sentence. The system has four modules: POS tagger which has 54% accuracy, the morphological analyzer which has 73.02% accuracy, the phonological analyzer is corpus-based and unknown handler which has two functions, the automaton and the generalized tree which has 67% accuracy and 64% respectively. |
---|