Integrated linguistic to Statistical Machine Translation
In the field of Natural Language Processing, automatic machine translation is an attractive application for a supporting user to translate some sentences in a language to others. Today, Phrase-based Statistical Machine Translation is the-state-of-the-art with benet in the word choosing, distor...
Saved in:
Main Author: | |
---|---|
Format: | Theses and Dissertations |
Language: | other |
Published: |
Đại học Quốc gia Hà Nội
2016
|
Subjects: | |
Online Access: | http://repository.vnu.edu.vn/handle/VNU_123/8256 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Vietnam National University, Hanoi |
Language: | other |
Summary: | In the field of Natural Language Processing, automatic machine
translation is an attractive application for a supporting user to translate some
sentences in a language to others. Today, Phrase-based Statistical Machine
Translation is the-state-of-the-art with benet in the word choosing, distortion based
on the distance between words. However, we still have some problem with global
dis-tortion model of different languages (long distance between words). In some
previous studies, the linguistic information such as a syntax tree, morphology
information or hierarchical of phrase is used. Similarly, we also use the syntax tree
to help the distortion model. However, instead of using full parse tree, we use a
shallow syntax tree (the height of tree is limited). By using some trans-formation
rules, we can arrange the order of some nodes in the shallow syntax tree. Hence,
we reorder the words in the sentence. A special point in our study is applying the
transformation rule on the sentence in the source language to get new sentence
with new order of words, which is similar with the target language, as
preprocessing step before training translation model or decoding with beam search
and log linear model. The experiment results from an English-Vietnamese pair
showed that our approach achieves significant improvements over MOSES which
is the state-of-the-art phrase based system |
---|