Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language

In this paper, we propose a new method for domain adaptation in Statistical Machine Translation for low-resource domains in English-Vietnamese language. Specifically, our method only uses monolingual data to adapt the translation phrase-table, our system brings improvements over the SMT baseline sys...

Full description

Saved in:
Bibliographic Details
Main Authors: Pham, Nghia-Luan, Nguyen, Van-Vinh
Format: Article
Language:English
Published: H. : ĐHQGHN 2020
Subjects:
Online Access:http://repository.vnu.edu.vn/handle/VNU_123/89094
https://doi.org/10.25073/2588-1086/vnucsce.231
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Vietnam National University, Hanoi
Language: English
Description
Summary:In this paper, we propose a new method for domain adaptation in Statistical Machine Translation for low-resource domains in English-Vietnamese language. Specifically, our method only uses monolingual data to adapt the translation phrase-table, our system brings improvements over the SMT baseline system. We propose two steps to improve the quality of SMT system: (i) classify phrases on the target side of the translation phrase-table use the probability classifier model, and (ii) adapt to the phrase-table translation by recomputing the direct translation probability of phrases. Our experiments are conducted with translation direction from English to Vietnamese on two very different domains that are legal domain (out-of-domain) and general domain (in-of-domain). The English-Vietnamese parallel corpus is provided by the IWSLT 2015 organizers and the experimental results showed that our method significantly outperformed the baseline system. Our system improved on the quality of machine translation in the legal domain up to 0.9 BLEU scores over the baseline system,…