Combining similarity and difference templates for a bidirectional example-based machine translation

The research presents TExt-2, an extension of TExt Translation that includes learning and using similarity and difference templates. The system learns templates from sentence-aligned bilingual corpus. The extraction of the templates was based only on the surface form of the text. An alignment method...

Full description

Saved in:
Bibliographic Details
Main Author: Nuñez, Vince Andrew D.
Format: text
Language:English
Published: Animo Repository 2007
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/3524
https://animorepository.dlsu.edu.ph/context/etd_masteral/article/10362/viewcontent/CDTG004329_P.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:The research presents TExt-2, an extension of TExt Translation that includes learning and using similarity and difference templates. The system learns templates from sentence-aligned bilingual corpus. The extraction of the templates was based only on the surface form of the text. An alignment method was used to find correspondences between the source and target sentences. The combination of difference templates, similarity templates and Chunk Refinement increased the number of templates learned by 60 and the chunks learned by 43 compared to TExt Translation. When the corpus for training was translated, TExt-2 was able to translate 12 more sentences correctly compared to TExt Translation. TExt-2 also gained a WER that is lower by 0.21%, a CWP that is higher by 0.94%, and a BLEU that is higher by 0.0344 when the sentences for translation have patterns that were not encountered during training. When the corpus for translation was based from the patterns learned by an STTL system, the WER of TExt Translation was lower by 1.68%, the SER was lower by 13.33%, the CWP was higher by 1.13%, and the BLEU was higher by 0.0344. TExt Translation got better scores because the sentences to be translated would match the templates learned through STTL, which is the algorithm used in TExt Translation. However, in the manual evaluation of the same corpus, the scores of TExt Translation tied with TExt-2. This indicated that even if the systems produced different translations, the meanings of these translations are the same. To improve the performance of the system, a huge lexicon and a full-blown morphological analyzer could be added. This would eliminate the need to manually enter the words in the lexicon every time a new corpus is used. Semantic analysis could be added to improve the selection of templates and chunks. The DTC process could also be limited on what it can learn since it can sometimes learn templates that are difficult to match and use due to the generalization of common words.