SMTPOST: Using statistical machine translation approach in Filipino part-of-speech tagging

The field of Natural Language Processing (NLP) in the country has been continually developing. However, the transition between Tagalog to the progressing Filipino language left tools and resources behind. This paper introduces a Statistical Machine Translation Part-of-Speech (POS) Tagger for Filipin...

Full description

Saved in:
Bibliographic Details
Main Authors: Nocon, Nicco Louis S., Borra, Allan
Format: text
Published: Animo Repository 2016
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/faculty_research/540
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
id oai:animorepository.dlsu.edu.ph:faculty_research-1539
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:faculty_research-15392022-07-07T02:49:19Z SMTPOST: Using statistical machine translation approach in Filipino part-of-speech tagging Nocon, Nicco Louis S. Borra, Allan The field of Natural Language Processing (NLP) in the country has been continually developing. However, the transition between Tagalog to the progressing Filipino language left tools and resources behind. This paper introduces a Statistical Machine Translation Part-of-Speech (POS) Tagger for Filipino (SMTPOST), with the purpose of reviving, updating and widening the scope of technologies in the POS tagging domain, catering to the changes made by the Filipino language. Resources built are comprised mainly of a tagset (218 tags), parallel corpus (2,668 sentences), affix rules (59 rules) and word-tag dictionary (309 entries). SMTPOST was tested to different tagsets and domains, producing 84.75% as its highest accuracy score, at least 3.75% increase from the available Tagalog POS taggers. Despite SMTPOST's utilization of Filipino resources and good performance, there are room for improvements and opportunities. Recommendations include a better feature extractor (preferably a morphological analyzer), an increase in scope for all of the resources, implementation of pre- And/or postprocessing, and the utilization of SMTPOST research to other NLP applications. 2016-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/faculty_research/540 Faculty Research Work Animo Repository Computational linguistics Filipino language—Machine translating Tagalog language—Machine translating Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
topic Computational linguistics
Filipino language—Machine translating
Tagalog language—Machine translating
Computer Sciences
spellingShingle Computational linguistics
Filipino language—Machine translating
Tagalog language—Machine translating
Computer Sciences
Nocon, Nicco Louis S.
Borra, Allan
SMTPOST: Using statistical machine translation approach in Filipino part-of-speech tagging
description The field of Natural Language Processing (NLP) in the country has been continually developing. However, the transition between Tagalog to the progressing Filipino language left tools and resources behind. This paper introduces a Statistical Machine Translation Part-of-Speech (POS) Tagger for Filipino (SMTPOST), with the purpose of reviving, updating and widening the scope of technologies in the POS tagging domain, catering to the changes made by the Filipino language. Resources built are comprised mainly of a tagset (218 tags), parallel corpus (2,668 sentences), affix rules (59 rules) and word-tag dictionary (309 entries). SMTPOST was tested to different tagsets and domains, producing 84.75% as its highest accuracy score, at least 3.75% increase from the available Tagalog POS taggers. Despite SMTPOST's utilization of Filipino resources and good performance, there are room for improvements and opportunities. Recommendations include a better feature extractor (preferably a morphological analyzer), an increase in scope for all of the resources, implementation of pre- And/or postprocessing, and the utilization of SMTPOST research to other NLP applications.
format text
author Nocon, Nicco Louis S.
Borra, Allan
author_facet Nocon, Nicco Louis S.
Borra, Allan
author_sort Nocon, Nicco Louis S.
title SMTPOST: Using statistical machine translation approach in Filipino part-of-speech tagging
title_short SMTPOST: Using statistical machine translation approach in Filipino part-of-speech tagging
title_full SMTPOST: Using statistical machine translation approach in Filipino part-of-speech tagging
title_fullStr SMTPOST: Using statistical machine translation approach in Filipino part-of-speech tagging
title_full_unstemmed SMTPOST: Using statistical machine translation approach in Filipino part-of-speech tagging
title_sort smtpost: using statistical machine translation approach in filipino part-of-speech tagging
publisher Animo Repository
publishDate 2016
url https://animorepository.dlsu.edu.ph/faculty_research/540
_version_ 1738854791019233280