Exploring clustering of Philippine languages in multilingual neural machine translation
Multilingual neural machine translation (MNMT) is a single model capable of translating several language directions. This has been shown to aid in translating low-resource languages such as Philippine languages. Moreover, there is also the empirical observation that clustering linguistically similar...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2022
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etdm_softtech/4 https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1006&context=etdm_softtech |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
id |
oai:animorepository.dlsu.edu.ph:etdm_softtech-1006 |
---|---|
record_format |
eprints |
spelling |
oai:animorepository.dlsu.edu.ph:etdm_softtech-10062022-12-13T06:44:05Z Exploring clustering of Philippine languages in multilingual neural machine translation Coronia, Jeremy Dale O Multilingual neural machine translation (MNMT) is a single model capable of translating several language directions. This has been shown to aid in translating low-resource languages such as Philippine languages. Moreover, there is also the empirical observation that clustering linguistically similar languages together can further aid translation performance. Based on these, we propose to develop multilingual Filipino neural machine translation systems wherein Philippine languages are clustered into different groups based on previous computational works and in linguistics studies. As such, several cluster-specific models were built, based around language families or other computational frameworks centered around the language relatedness of English, and eight (8) Philippine languages. Explorations were also made as to the choice of pivot language when performing pivot-based translation. Experiments show that the use of language clusters give comparable to or higher translation scores than using a baseline universal model, such as Tagalog, Cebuano and Hiligaynon being more likely to perform better with each other. Experiments also show that pivot-based translation still scores higher than zero-shot translation, and that English is still the best pivot to be used in a universal translation model setting. Finally, some issues are discussed with regards to the use of conventional automatic metrics on translation outputs concerning Philippine languages. 2022-12-01T08:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etdm_softtech/4 https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1006&context=etdm_softtech Software Technology Master's Theses English Animo Repository Natural language processing (Computer science) Philippine languages--Machine translating Computer Sciences |
institution |
De La Salle University |
building |
De La Salle University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
De La Salle University Library |
collection |
DLSU Institutional Repository |
language |
English |
topic |
Natural language processing (Computer science) Philippine languages--Machine translating Computer Sciences |
spellingShingle |
Natural language processing (Computer science) Philippine languages--Machine translating Computer Sciences Coronia, Jeremy Dale O Exploring clustering of Philippine languages in multilingual neural machine translation |
description |
Multilingual neural machine translation (MNMT) is a single model capable of translating several language directions. This has been shown to aid in translating low-resource languages such as Philippine languages. Moreover, there is also the empirical observation that clustering linguistically similar languages together can further aid translation performance. Based on these, we propose to develop multilingual Filipino neural machine translation systems wherein Philippine languages are clustered into different groups based on previous computational works and in linguistics studies. As such, several cluster-specific models were built, based around language families or other computational frameworks centered around the language relatedness of English, and eight (8) Philippine languages. Explorations were also made as to the choice of pivot language when performing pivot-based translation. Experiments show that the use of language clusters give comparable to or higher translation scores than using a baseline universal model, such as Tagalog, Cebuano and Hiligaynon being more likely to perform better with each other. Experiments also show that pivot-based translation still scores higher than zero-shot translation, and that English is still the best pivot to be used in a universal translation model setting. Finally, some issues are discussed with regards to the use of conventional automatic metrics on translation outputs concerning Philippine languages. |
format |
text |
author |
Coronia, Jeremy Dale O |
author_facet |
Coronia, Jeremy Dale O |
author_sort |
Coronia, Jeremy Dale O |
title |
Exploring clustering of Philippine languages in multilingual neural machine translation |
title_short |
Exploring clustering of Philippine languages in multilingual neural machine translation |
title_full |
Exploring clustering of Philippine languages in multilingual neural machine translation |
title_fullStr |
Exploring clustering of Philippine languages in multilingual neural machine translation |
title_full_unstemmed |
Exploring clustering of Philippine languages in multilingual neural machine translation |
title_sort |
exploring clustering of philippine languages in multilingual neural machine translation |
publisher |
Animo Repository |
publishDate |
2022 |
url |
https://animorepository.dlsu.edu.ph/etdm_softtech/4 https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1006&context=etdm_softtech |
_version_ |
1753806404982931456 |