Translate-train embracing translationese artifacts

Translate-train is a general training approach to multilingual tasks. The key idea is to use the translator of the target language to generate training data to mitigate the gap between the source and target languages. However, its performance is often hampered by the artifacts in the translated text...

Full description

Saved in:
Bibliographic Details
Main Authors: YU, Sicheng, SUN, Qianru, ZHANG, Hao, JIANG, Jing
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7475
https://ink.library.smu.edu.sg/context/sis_research/article/8478/viewcontent/2022.acl_short.40.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8478
record_format dspace
spelling sg-smu-ink.sis_research-84782022-11-03T06:51:13Z Translate-train embracing translationese artifacts YU, Sicheng SUN, Qianru ZHANG, Hao JIANG, Jing Translate-train is a general training approach to multilingual tasks. The key idea is to use the translator of the target language to generate training data to mitigate the gap between the source and target languages. However, its performance is often hampered by the artifacts in the translated texts (translationese). We discover that such artifacts have common patterns in different languages and can be modeled by deep learning, and subsequently propose an approach to conduct translate-train using Translationese Embracing the effect of Artifacts (TEA). TEA learns to mitigate such effect on the training data of a source language (whose original and translationese are both available), and applies the learned module to facilitate the inference on the target language. Extensive experiments on the multilingual QA dataset TyDiQA demonstrate that TEA outperforms strong baselines. 2022-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7475 info:doi/10.18653/v1/2022.acl-short.40 https://ink.library.smu.edu.sg/context/sis_research/article/8478/viewcontent/2022.acl_short.40.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Programming Languages and Compilers
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Programming Languages and Compilers
spellingShingle Databases and Information Systems
Programming Languages and Compilers
YU, Sicheng
SUN, Qianru
ZHANG, Hao
JIANG, Jing
Translate-train embracing translationese artifacts
description Translate-train is a general training approach to multilingual tasks. The key idea is to use the translator of the target language to generate training data to mitigate the gap between the source and target languages. However, its performance is often hampered by the artifacts in the translated texts (translationese). We discover that such artifacts have common patterns in different languages and can be modeled by deep learning, and subsequently propose an approach to conduct translate-train using Translationese Embracing the effect of Artifacts (TEA). TEA learns to mitigate such effect on the training data of a source language (whose original and translationese are both available), and applies the learned module to facilitate the inference on the target language. Extensive experiments on the multilingual QA dataset TyDiQA demonstrate that TEA outperforms strong baselines.
format text
author YU, Sicheng
SUN, Qianru
ZHANG, Hao
JIANG, Jing
author_facet YU, Sicheng
SUN, Qianru
ZHANG, Hao
JIANG, Jing
author_sort YU, Sicheng
title Translate-train embracing translationese artifacts
title_short Translate-train embracing translationese artifacts
title_full Translate-train embracing translationese artifacts
title_fullStr Translate-train embracing translationese artifacts
title_full_unstemmed Translate-train embracing translationese artifacts
title_sort translate-train embracing translationese artifacts
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7475
https://ink.library.smu.edu.sg/context/sis_research/article/8478/viewcontent/2022.acl_short.40.pdf
_version_ 1770576353161117696