TPOST: A template-based, n-gram part-of-speech tagger for tagalog

TPOST is a template-based n-gram Part-Of-Speech (POS) tagger for Tagalog. It is designed for languages with few and not comprehensive texical resources. The key to the algorithm is to utilize carefully chosen basic words and fundamental features used for word constructions, in tagging itself and in...

Full description

Saved in:
Bibliographic Details
Main Author: Rabo, Vlamir S.
Format: text
Language:English
Published: Animo Repository 2004
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/4788
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=11626&context=etd_masteral
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_masteral-11626
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_masteral-116262022-06-15T08:08:23Z TPOST: A template-based, n-gram part-of-speech tagger for tagalog Rabo, Vlamir S. TPOST is a template-based n-gram Part-Of-Speech (POS) tagger for Tagalog. It is designed for languages with few and not comprehensive texical resources. The key to the algorithm is to utilize carefully chosen basic words and fundamental features used for word constructions, in tagging itself and in disambiguating and solving unknown words surrounding it. TPOST was trained using 1983 words with 450 distinct features, from the first three chapters of the Book of Philippians. It was manually tagged by a linguist and highschool Filipino teachers, using 59 tags that are classified under 10 major POS tags. The accuracy of the tagger was tested in the same domain with 539 words with 221 distinct word features, and has achieved less than 8% and 11% errors for general and specific errors respectively. It was also tested on a different corpus on the domain of children's story books consisting of 1093 words with 397 distinct word features. The test resulted to an error below 17% and 23% for general and specific errors respectively. A lot of variations were also tested which further reduced the errors making TPOST algorithm a good foundation for further research in the field of POS Tagging in NLP. 2004-12-11T08:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etd_masteral/4788 https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=11626&context=etd_masteral Master's Theses English Animo Repository Tagalog language--Parts of speech Natural language processing (Computer science) Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Tagalog language--Parts of speech
Natural language processing (Computer science)
Computer Sciences
spellingShingle Tagalog language--Parts of speech
Natural language processing (Computer science)
Computer Sciences
Rabo, Vlamir S.
TPOST: A template-based, n-gram part-of-speech tagger for tagalog
description TPOST is a template-based n-gram Part-Of-Speech (POS) tagger for Tagalog. It is designed for languages with few and not comprehensive texical resources. The key to the algorithm is to utilize carefully chosen basic words and fundamental features used for word constructions, in tagging itself and in disambiguating and solving unknown words surrounding it. TPOST was trained using 1983 words with 450 distinct features, from the first three chapters of the Book of Philippians. It was manually tagged by a linguist and highschool Filipino teachers, using 59 tags that are classified under 10 major POS tags. The accuracy of the tagger was tested in the same domain with 539 words with 221 distinct word features, and has achieved less than 8% and 11% errors for general and specific errors respectively. It was also tested on a different corpus on the domain of children's story books consisting of 1093 words with 397 distinct word features. The test resulted to an error below 17% and 23% for general and specific errors respectively. A lot of variations were also tested which further reduced the errors making TPOST algorithm a good foundation for further research in the field of POS Tagging in NLP.
format text
author Rabo, Vlamir S.
author_facet Rabo, Vlamir S.
author_sort Rabo, Vlamir S.
title TPOST: A template-based, n-gram part-of-speech tagger for tagalog
title_short TPOST: A template-based, n-gram part-of-speech tagger for tagalog
title_full TPOST: A template-based, n-gram part-of-speech tagger for tagalog
title_fullStr TPOST: A template-based, n-gram part-of-speech tagger for tagalog
title_full_unstemmed TPOST: A template-based, n-gram part-of-speech tagger for tagalog
title_sort tpost: a template-based, n-gram part-of-speech tagger for tagalog
publisher Animo Repository
publishDate 2004
url https://animorepository.dlsu.edu.ph/etd_masteral/4788
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=11626&context=etd_masteral
_version_ 1736864139697979392