A stemming algorithm for Tagalog words
Tag-SA, a Tagalog Stemming Algorithm, was developed for all forms of Tagalog words. It can be used specifically for morphological analysis to derive root words. In addition, it can also be applied to information retrieval (IR) to conflate different word forms to a common canonical form. It uses the...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2003
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etd_masteral/3111 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
Summary: | Tag-SA, a Tagalog Stemming Algorithm, was developed for all forms of Tagalog words. It can be used specifically for morphological analysis to derive root words. In addition, it can also be applied to information retrieval (IR) to conflate different word forms to a common canonical form. It uses the principle of iterative affix removal and is context sensitive. The system was tested and evaluated based on error counting using 6,382 words variants derived from three sources (duplicates included). The resulting understemming error of less than 15 % and overstemming error of less than 0.005 % indicate a good performance of TagSA. |
---|