Machine translation of software-specific documentations
Machine translation is automated translation which is a process to translate language from one to another with computer software. It uses bilingual data to build language and translation model that used to translate the text. It is one of the important steps in software localisation. There are many...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/70166 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Machine translation is automated translation which is a process to translate language from one to another with computer software. It uses bilingual data to build language and translation model that used to translate the text. It is one of the important steps in software localisation. There are many studies have been carried out to locate need-to-translate strings in software and adapt UI layout after text translation in the new language. However, there is no work has been done on one of the most important and time-consuming steps which to work on the translation of software text. In software text, there are some unique characteristics, for example, application specific naming, context-sensitive translation, domain-specific rare words that general machine translation tools such as Google Translate cannot properly translate it. Therefore, in this project, we will study a statistical machine translation with a phrase-based model to train and work for software text translation. We collect human-translated bilingual sentence pairs from Python-related documentations from the internet. We then use an open source software toolkit for statistical machine translation after preprocessing the data. Lastly, we evaluate our test sets with BLEU (bilingual evaluation understudy) and WER (word error rate) to get the translation quality and find out how this model and what is the problem and what can be improved. |
---|