Combining similarity and difference templates for a bidirectional example-based machine translation

The research presents TExt-2, an extension of TExt Translation that includes learning and using similarity and difference templates. The system learns templates from sentence-aligned bilingual corpus. The extraction of the templates was based only on the surface form of the text. An alignment method...

Full description

Saved in:

Bibliographic Details
Main Author:	Nuñez, Vince Andrew D.
Format:	text
Language:	English
Published:	Animo Repository 2007
Subjects:	Translators (Computer programs) Template matching (Digital image processing) Machine learning Information organization Information retrieval
Online Access:	https://animorepository.dlsu.edu.ph/etd_masteral/3524 https://animorepository.dlsu.edu.ph/context/etd_masteral/article/10362/viewcontent/CDTG004329_P.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	De La Salle University
Language:	English

id	oai:animorepository.dlsu.edu.ph:etd_masteral-10362
record_format	eprints
spelling	oai:animorepository.dlsu.edu.ph:etd_masteral-103622023-10-04T08:19:15Z Combining similarity and difference templates for a bidirectional example-based machine translation Nuñez, Vince Andrew D. The research presents TExt-2, an extension of TExt Translation that includes learning and using similarity and difference templates. The system learns templates from sentence-aligned bilingual corpus. The extraction of the templates was based only on the surface form of the text. An alignment method was used to find correspondences between the source and target sentences. The combination of difference templates, similarity templates and Chunk Refinement increased the number of templates learned by 60 and the chunks learned by 43 compared to TExt Translation. When the corpus for training was translated, TExt-2 was able to translate 12 more sentences correctly compared to TExt Translation. TExt-2 also gained a WER that is lower by 0.21%, a CWP that is higher by 0.94%, and a BLEU that is higher by 0.0344 when the sentences for translation have patterns that were not encountered during training. When the corpus for translation was based from the patterns learned by an STTL system, the WER of TExt Translation was lower by 1.68%, the SER was lower by 13.33%, the CWP was higher by 1.13%, and the BLEU was higher by 0.0344. TExt Translation got better scores because the sentences to be translated would match the templates learned through STTL, which is the algorithm used in TExt Translation. However, in the manual evaluation of the same corpus, the scores of TExt Translation tied with TExt-2. This indicated that even if the systems produced different translations, the meanings of these translations are the same. To improve the performance of the system, a huge lexicon and a full-blown morphological analyzer could be added. This would eliminate the need to manually enter the words in the lexicon every time a new corpus is used. Semantic analysis could be added to improve the selection of templates and chunks. The DTC process could also be limited on what it can learn since it can sometimes learn templates that are difficult to match and use due to the generalization of common words. 2007-01-01T08:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etd_masteral/3524 https://animorepository.dlsu.edu.ph/context/etd_masteral/article/10362/viewcontent/CDTG004329_P.pdf Master's Theses English Animo Repository Translators (Computer programs) Template matching (Digital image processing) Machine learning Information organization Information retrieval
institution	De La Salle University
building	De La Salle University Library
continent	Asia
country	Philippines Philippines
content_provider	De La Salle University Library
collection	DLSU Institutional Repository
language	English
topic	Translators (Computer programs) Template matching (Digital image processing) Machine learning Information organization Information retrieval
spellingShingle	Translators (Computer programs) Template matching (Digital image processing) Machine learning Information organization Information retrieval Nuñez, Vince Andrew D. Combining similarity and difference templates for a bidirectional example-based machine translation
description	The research presents TExt-2, an extension of TExt Translation that includes learning and using similarity and difference templates. The system learns templates from sentence-aligned bilingual corpus. The extraction of the templates was based only on the surface form of the text. An alignment method was used to find correspondences between the source and target sentences. The combination of difference templates, similarity templates and Chunk Refinement increased the number of templates learned by 60 and the chunks learned by 43 compared to TExt Translation. When the corpus for training was translated, TExt-2 was able to translate 12 more sentences correctly compared to TExt Translation. TExt-2 also gained a WER that is lower by 0.21%, a CWP that is higher by 0.94%, and a BLEU that is higher by 0.0344 when the sentences for translation have patterns that were not encountered during training. When the corpus for translation was based from the patterns learned by an STTL system, the WER of TExt Translation was lower by 1.68%, the SER was lower by 13.33%, the CWP was higher by 1.13%, and the BLEU was higher by 0.0344. TExt Translation got better scores because the sentences to be translated would match the templates learned through STTL, which is the algorithm used in TExt Translation. However, in the manual evaluation of the same corpus, the scores of TExt Translation tied with TExt-2. This indicated that even if the systems produced different translations, the meanings of these translations are the same. To improve the performance of the system, a huge lexicon and a full-blown morphological analyzer could be added. This would eliminate the need to manually enter the words in the lexicon every time a new corpus is used. Semantic analysis could be added to improve the selection of templates and chunks. The DTC process could also be limited on what it can learn since it can sometimes learn templates that are difficult to match and use due to the generalization of common words.
format	text
author	Nuñez, Vince Andrew D.
author_facet	Nuñez, Vince Andrew D.
author_sort	Nuñez, Vince Andrew D.
title	Combining similarity and difference templates for a bidirectional example-based machine translation
title_short	Combining similarity and difference templates for a bidirectional example-based machine translation
title_full	Combining similarity and difference templates for a bidirectional example-based machine translation
title_fullStr	Combining similarity and difference templates for a bidirectional example-based machine translation
title_full_unstemmed	Combining similarity and difference templates for a bidirectional example-based machine translation
title_sort	combining similarity and difference templates for a bidirectional example-based machine translation
publisher	Animo Repository
publishDate	2007
url	https://animorepository.dlsu.edu.ph/etd_masteral/3524 https://animorepository.dlsu.edu.ph/context/etd_masteral/article/10362/viewcontent/CDTG004329_P.pdf
_version_	1779260486197444608

Combining similarity and difference templates for a bidirectional example-based machine translation

Similar Items