Translate-train embracing translationese artifacts
Translate-train is a general training approach to multilingual tasks. The key idea is to use the translator of the target language to generate training data to mitigate the gap between the source and target languages. However, its performance is often hampered by the artifacts in the translated text...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7475 https://ink.library.smu.edu.sg/context/sis_research/article/8478/viewcontent/2022.acl_short.40.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-8478 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-84782022-11-03T06:51:13Z Translate-train embracing translationese artifacts YU, Sicheng SUN, Qianru ZHANG, Hao JIANG, Jing Translate-train is a general training approach to multilingual tasks. The key idea is to use the translator of the target language to generate training data to mitigate the gap between the source and target languages. However, its performance is often hampered by the artifacts in the translated texts (translationese). We discover that such artifacts have common patterns in different languages and can be modeled by deep learning, and subsequently propose an approach to conduct translate-train using Translationese Embracing the effect of Artifacts (TEA). TEA learns to mitigate such effect on the training data of a source language (whose original and translationese are both available), and applies the learned module to facilitate the inference on the target language. Extensive experiments on the multilingual QA dataset TyDiQA demonstrate that TEA outperforms strong baselines. 2022-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7475 info:doi/10.18653/v1/2022.acl-short.40 https://ink.library.smu.edu.sg/context/sis_research/article/8478/viewcontent/2022.acl_short.40.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Programming Languages and Compilers |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems Programming Languages and Compilers |
spellingShingle |
Databases and Information Systems Programming Languages and Compilers YU, Sicheng SUN, Qianru ZHANG, Hao JIANG, Jing Translate-train embracing translationese artifacts |
description |
Translate-train is a general training approach to multilingual tasks. The key idea is to use the translator of the target language to generate training data to mitigate the gap between the source and target languages. However, its performance is often hampered by the artifacts in the translated texts (translationese). We discover that such artifacts have common patterns in different languages and can be modeled by deep learning, and subsequently propose an approach to conduct translate-train using Translationese Embracing the effect of Artifacts (TEA). TEA learns to mitigate such effect on the training data of a source language (whose original and translationese are both available), and applies the learned module to facilitate the inference on the target language. Extensive experiments on the multilingual QA dataset TyDiQA demonstrate that TEA outperforms strong baselines. |
format |
text |
author |
YU, Sicheng SUN, Qianru ZHANG, Hao JIANG, Jing |
author_facet |
YU, Sicheng SUN, Qianru ZHANG, Hao JIANG, Jing |
author_sort |
YU, Sicheng |
title |
Translate-train embracing translationese artifacts |
title_short |
Translate-train embracing translationese artifacts |
title_full |
Translate-train embracing translationese artifacts |
title_fullStr |
Translate-train embracing translationese artifacts |
title_full_unstemmed |
Translate-train embracing translationese artifacts |
title_sort |
translate-train embracing translationese artifacts |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2022 |
url |
https://ink.library.smu.edu.sg/sis_research/7475 https://ink.library.smu.edu.sg/context/sis_research/article/8478/viewcontent/2022.acl_short.40.pdf |
_version_ |
1770576353161117696 |