Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language

In this paper, we propose a new method for domain adaptation in Statistical Machine Translation for low-resource domains in English-Vietnamese language. Specifically, our method only uses monolingual data to adapt the translation phrase-table, our system brings improvements over the SMT baseline sys...

Full description

Saved in:
Bibliographic Details
Main Authors: Pham, Nghia-Luan, Nguyen, Van-Vinh
Format: Article
Language:English
Published: H. : ĐHQGHN 2020
Subjects:
Online Access:http://repository.vnu.edu.vn/handle/VNU_123/89094
https://doi.org/10.25073/2588-1086/vnucsce.231
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Vietnam National University, Hanoi
Language: English
id oai:112.137.131.14:VNU_123-89094
record_format dspace
spelling oai:112.137.131.14:VNU_123-890942020-06-23T02:29:49Z Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language Pham, Nghia-Luan Nguyen, Van-Vinh Machine Translation Statistical Machine Translation Domain Adaptation In this paper, we propose a new method for domain adaptation in Statistical Machine Translation for low-resource domains in English-Vietnamese language. Specifically, our method only uses monolingual data to adapt the translation phrase-table, our system brings improvements over the SMT baseline system. We propose two steps to improve the quality of SMT system: (i) classify phrases on the target side of the translation phrase-table use the probability classifier model, and (ii) adapt to the phrase-table translation by recomputing the direct translation probability of phrases. Our experiments are conducted with translation direction from English to Vietnamese on two very different domains that are legal domain (out-of-domain) and general domain (in-of-domain). The English-Vietnamese parallel corpus is provided by the IWSLT 2015 organizers and the experimental results showed that our method significantly outperformed the baseline system. Our system improved on the quality of machine translation in the legal domain up to 0.9 BLEU scores over the baseline system,… 2020-06-23T02:29:48Z 2020-06-23T02:29:48Z 2019 Article Pham, N-L., & Nguyen, V-V. (2019). Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language. VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 46-56. 2588-1086 http://repository.vnu.edu.vn/handle/VNU_123/89094 https://doi.org/10.25073/2588-1086/vnucsce.231 en Computer Science and Communication Engineering; application/pdf H. : ĐHQGHN
institution Vietnam National University, Hanoi
building VNU Library & Information Center
country Vietnam
collection VNU Digital Repository
language English
topic Machine Translation
Statistical Machine Translation
Domain Adaptation
spellingShingle Machine Translation
Statistical Machine Translation
Domain Adaptation
Pham, Nghia-Luan
Nguyen, Van-Vinh
Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
description In this paper, we propose a new method for domain adaptation in Statistical Machine Translation for low-resource domains in English-Vietnamese language. Specifically, our method only uses monolingual data to adapt the translation phrase-table, our system brings improvements over the SMT baseline system. We propose two steps to improve the quality of SMT system: (i) classify phrases on the target side of the translation phrase-table use the probability classifier model, and (ii) adapt to the phrase-table translation by recomputing the direct translation probability of phrases. Our experiments are conducted with translation direction from English to Vietnamese on two very different domains that are legal domain (out-of-domain) and general domain (in-of-domain). The English-Vietnamese parallel corpus is provided by the IWSLT 2015 organizers and the experimental results showed that our method significantly outperformed the baseline system. Our system improved on the quality of machine translation in the legal domain up to 0.9 BLEU scores over the baseline system,…
format Article
author Pham, Nghia-Luan
Nguyen, Van-Vinh
author_facet Pham, Nghia-Luan
Nguyen, Van-Vinh
author_sort Pham, Nghia-Luan
title Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
title_short Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
title_full Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
title_fullStr Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
title_full_unstemmed Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
title_sort adaptation in statistical machine translation for low-resource domains in english-vietnamese language
publisher H. : ĐHQGHN
publishDate 2020
url http://repository.vnu.edu.vn/handle/VNU_123/89094
https://doi.org/10.25073/2588-1086/vnucsce.231
_version_ 1680965392300769280