Text translation: Template extraction for a bidiretional english-filipino example-based machine translation

A bidirectional English-Filipino Example-based Machine Translation System that learns and uses templates is presented. The system uses machine learning techniques to initially extract templates from a given bilingual corpus. These templates are subsequently used for translating English input text in...

Full description

Saved in:
Bibliographic Details
Main Authors: Go, Kathleen L., Morga, Manimin R., Nunez, Vince Andrew D., Veto, Francis Germiline S.
Format: text
Language:English
Published: Animo Repository 2006
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/14396
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:A bidirectional English-Filipino Example-based Machine Translation System that learns and uses templates is presented. The system uses machine learning techniques to initially extract templates from a given bilingual corpus. These templates are subsequently used for translating English input text into Filipino and vise versa. The system implements the similarity template learning algorithm performed by (Cicekli et. al, 2001) but goes further by introducing template refinement and derivation of templates from chunks learned. To improve translation quality, new chunk alignment and splitting algorithms are introduced into the training process while a flexible template and chunk matching scheme is establish for translation. Test results verify that a strict chunk alignment scheme in training is needed and that specific words such as commonly occurring words need to be filtered out to produce better templates, thereby improving overall quality by assuring complete template and chunk correctness in training and reducing word and sentence error rates by as much as half in translation. Tests also show that the translation with the highest score selected from various candidates is consistently the best choice as checked against automotive evaluation methods. Still, much of the system implementation is limited by the quality and coverage of the lexicon and morphological references which are patterned after those of TWiRL's a rule-based machine translator. This research is part of a three-year project on hybrid machine translation that is funded by the Philippine Council for Advanced Science and Technology Research and Development of the Department of Science and Technology (DOST-PCASTRD).