Bridging Philippine languages with multilingual neural machine translation

The Philippines is home to more than 150 languages that are considered low- resourced, Resulting in a lack of pursuit in developing a translation system for most of its languages. To aid in improving the results and processes of translation systems for low-resource languages, multilingual NMT became...

Full description

Saved in:

Bibliographic Details
Main Author:	Baliber, Renz Iver D.
Format:	text
Language:	English
Published:	Animo Repository 2021
Subjects:	Philippine languages—Translations Translators (Computer programs) Computer Sciences
Online Access:	https://animorepository.dlsu.edu.ph/etdm_comsci/8 https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1008&context=etdm_comsci
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	De La Salle University
Language:	English

id	oai:animorepository.dlsu.edu.ph:etdm_comsci-1008
record_format	eprints
spelling	oai:animorepository.dlsu.edu.ph:etdm_comsci-10082021-11-22T07:23:33Z Bridging Philippine languages with multilingual neural machine translation Baliber, Renz Iver D. The Philippines is home to more than 150 languages that are considered low- resourced, Resulting in a lack of pursuit in developing a translation system for most of its languages. To aid in improving the results and processes of translation systems for low-resource languages, multilingual NMT became an active area of research. However, existing works in multilingual NMT disregard the analysis of a multilingual model on a closely related and low-resource language group in the context of zero-resource translation. In this study, we have benchmarked translation systems for several Philip- pine Languages and provide an analysis of a transformer-based multilingual NMT system for morphologically rich and low-resource languages in terms of its ca- pabilities in translating unseen language pairs using zero-shot translation and pivot-based translation. Our studies show that due to the architectural design of the Transformer model, common words and sentence-length differences affect the performance of a multilingual NMT in translating both seen and unseen lan- guage pairs with Bicolano, Cebuano, and Hiligaynon consistently perform better than the other languages in various translation task by having a good balance of commonality and sentence length difference. This work also investigated the effect of increasing the model size and capacity that allowed the model to build a language invariant shared representation space and stronger decoding capabilities to do zero-shot translation where the previous model with smaller capacity failed to develop a language invariant shared represen- tation space and could only produce translations up to English when attempting a zero-shot translation. Since we are dealing with low-resource multilingual data, some of the risks involved are domain shift and out-of-vocabulary words. We have also shown how the multilingual NMT leverages joint byte-pair encoding and the shared represen- tation space to produce translation for unseen or rare words. Lastly, we have shown that the transformer-based multilingual NMT can com- pete with, or outperform other translation approaches as we have shown in a comparative analysis against the baseline statistical MT models where several statistical-based translation models were produced to compare its performance against a single multilingual NMT model. We have shown in the results that the translation performance of the multilingual NMT is superior to the Statisti- cal MT models both in bidirectional English and Philippine languages translation task and a pivot-based Philippine languages translation task where we have shown the capability of the multilingual NMT model to retain information and context across multilingual translation, something that the statistical MT models failed to do. The multilingual NMT model is also capable of producing competitive results against a directly trained NMT in a bidirectional Cebuano and Tagalog translation task where the pivot-based approach of the multilingual NMT pro- duced 6.72 and 7.20 BLEU scores against the 9.54 and 10.55 BLEU scores of a directly trained NMT for Tagalog to Cebuano and Cebuano to Tagalog transla- tion tasks even though the multilingual NMT does not have any parallel Cebuano and Tagalog datasets, proving the effectiveness of a multilingual NMT model in building translations systems for low-resource languages. 2021-07-14T07:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etdm_comsci/8 https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1008&context=etdm_comsci Computer Science Master's Theses English Animo Repository Philippine languages—Translations Translators (Computer programs) Computer Sciences
institution	De La Salle University
building	De La Salle University Library
continent	Asia
country	Philippines Philippines
content_provider	De La Salle University Library
collection	DLSU Institutional Repository
language	English
topic	Philippine languages—Translations Translators (Computer programs) Computer Sciences
spellingShingle	Philippine languages—Translations Translators (Computer programs) Computer Sciences Baliber, Renz Iver D. Bridging Philippine languages with multilingual neural machine translation
description	The Philippines is home to more than 150 languages that are considered low- resourced, Resulting in a lack of pursuit in developing a translation system for most of its languages. To aid in improving the results and processes of translation systems for low-resource languages, multilingual NMT became an active area of research. However, existing works in multilingual NMT disregard the analysis of a multilingual model on a closely related and low-resource language group in the context of zero-resource translation. In this study, we have benchmarked translation systems for several Philip- pine Languages and provide an analysis of a transformer-based multilingual NMT system for morphologically rich and low-resource languages in terms of its ca- pabilities in translating unseen language pairs using zero-shot translation and pivot-based translation. Our studies show that due to the architectural design of the Transformer model, common words and sentence-length differences affect the performance of a multilingual NMT in translating both seen and unseen lan- guage pairs with Bicolano, Cebuano, and Hiligaynon consistently perform better than the other languages in various translation task by having a good balance of commonality and sentence length difference. This work also investigated the effect of increasing the model size and capacity that allowed the model to build a language invariant shared representation space and stronger decoding capabilities to do zero-shot translation where the previous model with smaller capacity failed to develop a language invariant shared represen- tation space and could only produce translations up to English when attempting a zero-shot translation. Since we are dealing with low-resource multilingual data, some of the risks involved are domain shift and out-of-vocabulary words. We have also shown how the multilingual NMT leverages joint byte-pair encoding and the shared represen- tation space to produce translation for unseen or rare words. Lastly, we have shown that the transformer-based multilingual NMT can com- pete with, or outperform other translation approaches as we have shown in a comparative analysis against the baseline statistical MT models where several statistical-based translation models were produced to compare its performance against a single multilingual NMT model. We have shown in the results that the translation performance of the multilingual NMT is superior to the Statisti- cal MT models both in bidirectional English and Philippine languages translation task and a pivot-based Philippine languages translation task where we have shown the capability of the multilingual NMT model to retain information and context across multilingual translation, something that the statistical MT models failed to do. The multilingual NMT model is also capable of producing competitive results against a directly trained NMT in a bidirectional Cebuano and Tagalog translation task where the pivot-based approach of the multilingual NMT pro- duced 6.72 and 7.20 BLEU scores against the 9.54 and 10.55 BLEU scores of a directly trained NMT for Tagalog to Cebuano and Cebuano to Tagalog transla- tion tasks even though the multilingual NMT does not have any parallel Cebuano and Tagalog datasets, proving the effectiveness of a multilingual NMT model in building translations systems for low-resource languages.
format	text
author	Baliber, Renz Iver D.
author_facet	Baliber, Renz Iver D.
author_sort	Baliber, Renz Iver D.
title	Bridging Philippine languages with multilingual neural machine translation
title_short	Bridging Philippine languages with multilingual neural machine translation
title_full	Bridging Philippine languages with multilingual neural machine translation
title_fullStr	Bridging Philippine languages with multilingual neural machine translation
title_full_unstemmed	Bridging Philippine languages with multilingual neural machine translation
title_sort	bridging philippine languages with multilingual neural machine translation
publisher	Animo Repository
publishDate	2021
url	https://animorepository.dlsu.edu.ph/etdm_comsci/8 https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1008&context=etdm_comsci
_version_	1718383353250447360

Bridging Philippine languages with multilingual neural machine translation

Similar Items