Combining similarity and difference templates for a bidirectional example-based machine translation

The research presents TExt-2, an extension of TExt Translation that includes learning and using similarity and difference templates. The system learns templates from sentence-aligned bilingual corpus. The extraction of the templates was based only on the surface form of the text. An alignment method...

Full description

Saved in:
Bibliographic Details
Main Author: Nuñez, Vince Andrew D.
Format: text
Language:English
Published: Animo Repository 2007
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/3524
https://animorepository.dlsu.edu.ph/context/etd_masteral/article/10362/viewcontent/CDTG004329_P.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_masteral-10362
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_masteral-103622023-10-04T08:19:15Z Combining similarity and difference templates for a bidirectional example-based machine translation Nuñez, Vince Andrew D. The research presents TExt-2, an extension of TExt Translation that includes learning and using similarity and difference templates. The system learns templates from sentence-aligned bilingual corpus. The extraction of the templates was based only on the surface form of the text. An alignment method was used to find correspondences between the source and target sentences. The combination of difference templates, similarity templates and Chunk Refinement increased the number of templates learned by 60 and the chunks learned by 43 compared to TExt Translation. When the corpus for training was translated, TExt-2 was able to translate 12 more sentences correctly compared to TExt Translation. TExt-2 also gained a WER that is lower by 0.21%, a CWP that is higher by 0.94%, and a BLEU that is higher by 0.0344 when the sentences for translation have patterns that were not encountered during training. When the corpus for translation was based from the patterns learned by an STTL system, the WER of TExt Translation was lower by 1.68%, the SER was lower by 13.33%, the CWP was higher by 1.13%, and the BLEU was higher by 0.0344. TExt Translation got better scores because the sentences to be translated would match the templates learned through STTL, which is the algorithm used in TExt Translation. However, in the manual evaluation of the same corpus, the scores of TExt Translation tied with TExt-2. This indicated that even if the systems produced different translations, the meanings of these translations are the same. To improve the performance of the system, a huge lexicon and a full-blown morphological analyzer could be added. This would eliminate the need to manually enter the words in the lexicon every time a new corpus is used. Semantic analysis could be added to improve the selection of templates and chunks. The DTC process could also be limited on what it can learn since it can sometimes learn templates that are difficult to match and use due to the generalization of common words. 2007-01-01T08:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etd_masteral/3524 https://animorepository.dlsu.edu.ph/context/etd_masteral/article/10362/viewcontent/CDTG004329_P.pdf Master's Theses English Animo Repository Translators (Computer programs) Template matching (Digital image processing) Machine learning Information organization Information retrieval
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Translators (Computer programs)
Template matching (Digital image processing)
Machine learning
Information organization
Information retrieval
spellingShingle Translators (Computer programs)
Template matching (Digital image processing)
Machine learning
Information organization
Information retrieval
Nuñez, Vince Andrew D.
Combining similarity and difference templates for a bidirectional example-based machine translation
description The research presents TExt-2, an extension of TExt Translation that includes learning and using similarity and difference templates. The system learns templates from sentence-aligned bilingual corpus. The extraction of the templates was based only on the surface form of the text. An alignment method was used to find correspondences between the source and target sentences. The combination of difference templates, similarity templates and Chunk Refinement increased the number of templates learned by 60 and the chunks learned by 43 compared to TExt Translation. When the corpus for training was translated, TExt-2 was able to translate 12 more sentences correctly compared to TExt Translation. TExt-2 also gained a WER that is lower by 0.21%, a CWP that is higher by 0.94%, and a BLEU that is higher by 0.0344 when the sentences for translation have patterns that were not encountered during training. When the corpus for translation was based from the patterns learned by an STTL system, the WER of TExt Translation was lower by 1.68%, the SER was lower by 13.33%, the CWP was higher by 1.13%, and the BLEU was higher by 0.0344. TExt Translation got better scores because the sentences to be translated would match the templates learned through STTL, which is the algorithm used in TExt Translation. However, in the manual evaluation of the same corpus, the scores of TExt Translation tied with TExt-2. This indicated that even if the systems produced different translations, the meanings of these translations are the same. To improve the performance of the system, a huge lexicon and a full-blown morphological analyzer could be added. This would eliminate the need to manually enter the words in the lexicon every time a new corpus is used. Semantic analysis could be added to improve the selection of templates and chunks. The DTC process could also be limited on what it can learn since it can sometimes learn templates that are difficult to match and use due to the generalization of common words.
format text
author Nuñez, Vince Andrew D.
author_facet Nuñez, Vince Andrew D.
author_sort Nuñez, Vince Andrew D.
title Combining similarity and difference templates for a bidirectional example-based machine translation
title_short Combining similarity and difference templates for a bidirectional example-based machine translation
title_full Combining similarity and difference templates for a bidirectional example-based machine translation
title_fullStr Combining similarity and difference templates for a bidirectional example-based machine translation
title_full_unstemmed Combining similarity and difference templates for a bidirectional example-based machine translation
title_sort combining similarity and difference templates for a bidirectional example-based machine translation
publisher Animo Repository
publishDate 2007
url https://animorepository.dlsu.edu.ph/etd_masteral/3524
https://animorepository.dlsu.edu.ph/context/etd_masteral/article/10362/viewcontent/CDTG004329_P.pdf
_version_ 1779260486197444608