Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
In this paper, we propose a new method for domain adaptation in Statistical Machine Translation for low-resource domains in English-Vietnamese language. Specifically, our method only uses monolingual data to adapt the translation phrase-table, our system brings improvements over the SMT baseline sys...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
H. : ĐHQGHN
2020
|
Subjects: | |
Online Access: | http://repository.vnu.edu.vn/handle/VNU_123/89094 https://doi.org/10.25073/2588-1086/vnucsce.231 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Vietnam National University, Hanoi |
Language: | English |
id |
oai:112.137.131.14:VNU_123-89094 |
---|---|
record_format |
dspace |
spelling |
oai:112.137.131.14:VNU_123-890942020-06-23T02:29:49Z Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language Pham, Nghia-Luan Nguyen, Van-Vinh Machine Translation Statistical Machine Translation Domain Adaptation In this paper, we propose a new method for domain adaptation in Statistical Machine Translation for low-resource domains in English-Vietnamese language. Specifically, our method only uses monolingual data to adapt the translation phrase-table, our system brings improvements over the SMT baseline system. We propose two steps to improve the quality of SMT system: (i) classify phrases on the target side of the translation phrase-table use the probability classifier model, and (ii) adapt to the phrase-table translation by recomputing the direct translation probability of phrases. Our experiments are conducted with translation direction from English to Vietnamese on two very different domains that are legal domain (out-of-domain) and general domain (in-of-domain). The English-Vietnamese parallel corpus is provided by the IWSLT 2015 organizers and the experimental results showed that our method significantly outperformed the baseline system. Our system improved on the quality of machine translation in the legal domain up to 0.9 BLEU scores over the baseline system,… 2020-06-23T02:29:48Z 2020-06-23T02:29:48Z 2019 Article Pham, N-L., & Nguyen, V-V. (2019). Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language. VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 46-56. 2588-1086 http://repository.vnu.edu.vn/handle/VNU_123/89094 https://doi.org/10.25073/2588-1086/vnucsce.231 en Computer Science and Communication Engineering; application/pdf H. : ĐHQGHN |
institution |
Vietnam National University, Hanoi |
building |
VNU Library & Information Center |
country |
Vietnam |
collection |
VNU Digital Repository |
language |
English |
topic |
Machine Translation Statistical Machine Translation Domain Adaptation |
spellingShingle |
Machine Translation Statistical Machine Translation Domain Adaptation Pham, Nghia-Luan Nguyen, Van-Vinh Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language |
description |
In this paper, we propose a new method for domain adaptation in Statistical Machine Translation for low-resource domains in English-Vietnamese language. Specifically, our method only uses monolingual data to adapt the translation phrase-table, our system brings improvements over the SMT baseline system. We propose two steps to improve the quality of SMT system: (i) classify phrases on the target side of the translation phrase-table use the probability classifier model, and (ii) adapt to the phrase-table translation by recomputing the direct translation probability of phrases. Our experiments are conducted with translation direction from English to Vietnamese on two very different domains that are legal domain (out-of-domain) and general domain (in-of-domain). The English-Vietnamese parallel corpus is provided by the IWSLT 2015 organizers and the experimental results showed that our method significantly outperformed the baseline system. Our system improved on the quality of machine translation in the legal domain up to 0.9 BLEU scores over the baseline system,… |
format |
Article |
author |
Pham, Nghia-Luan Nguyen, Van-Vinh |
author_facet |
Pham, Nghia-Luan Nguyen, Van-Vinh |
author_sort |
Pham, Nghia-Luan |
title |
Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language |
title_short |
Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language |
title_full |
Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language |
title_fullStr |
Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language |
title_full_unstemmed |
Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language |
title_sort |
adaptation in statistical machine translation for low-resource domains in english-vietnamese language |
publisher |
H. : ĐHQGHN |
publishDate |
2020 |
url |
http://repository.vnu.edu.vn/handle/VNU_123/89094 https://doi.org/10.25073/2588-1086/vnucsce.231 |
_version_ |
1680965392300769280 |