Sentence-based alignment for parallel text corpora preparation for machine translation.
In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large nu...
Saved in:
Main Author: | |
---|---|
Format: | Final Year Project / Dissertation / Thesis |
Published: |
2021
|
Subjects: | |
Online Access: | http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf http://eprints.utar.edu.my/4261/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Tunku Abdul Rahman |
id |
my-utar-eprints.4261 |
---|---|
record_format |
eprints |
spelling |
my-utar-eprints.42612022-03-09T13:04:36Z Sentence-based alignment for parallel text corpora preparation for machine translation. Lee, Yong Wei QA75 Electronic computers. Computer science T Technology (General) In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large number of texts compared to human translators. With the aids of machine translator, it definitely saves a lot of our times. Besides, it is also cheaper than using a human translator. In machine translation, parallel corpus plays a significant role as a resource for translation training and language teaching. A good quality of parallel corpus will greatly increase the accuracy of the machine translation. Hence, sentence-based alignment for parallel text corpora plays an important role in helping NLP especially for machine translation. However, there are limited resources on parallel corpus for some selected source language and target language. Furthermore, the accuracy of machine translation on some target languages is still low. Therefore, an approach of generating parallel corpus on source language and target language is proposed. In this study, parallel corpus of English (source language) and Malay (target language) are collected. Besides, a machine translation is developed using recurrent neural network (RNN) model of neural network translation. An accuracy of training with 0.9 is obtained from the model. Besides, the translated Malay text achieved BLEU score of 0.65 which is considered a good score. 2021-04-15 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf Lee, Yong Wei (2021) Sentence-based alignment for parallel text corpora preparation for machine translation. Final Year Project, UTAR. http://eprints.utar.edu.my/4261/ |
institution |
Universiti Tunku Abdul Rahman |
building |
UTAR Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Tunku Abdul Rahman |
content_source |
UTAR Institutional Repository |
url_provider |
http://eprints.utar.edu.my |
topic |
QA75 Electronic computers. Computer science T Technology (General) |
spellingShingle |
QA75 Electronic computers. Computer science T Technology (General) Lee, Yong Wei Sentence-based alignment for parallel text corpora preparation for machine translation. |
description |
In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large number of texts compared to human translators. With the aids of machine translator, it definitely saves a lot of our times. Besides, it is also cheaper than using a human translator. In machine translation, parallel corpus plays a significant role as a resource for translation training and language teaching. A good quality of parallel corpus will greatly increase the accuracy of the machine translation. Hence, sentence-based alignment for parallel text corpora plays an important role in helping NLP especially for machine translation. However, there are limited resources on parallel corpus for some selected source language and target language. Furthermore, the accuracy of machine translation on some target languages is still low. Therefore, an approach of generating parallel corpus on source language and target language is proposed. In this study, parallel corpus of English (source language) and Malay (target language) are collected. Besides, a machine translation is developed using recurrent neural network (RNN) model of neural network translation. An accuracy of training with 0.9 is obtained from the model. Besides, the translated Malay text achieved BLEU score of 0.65 which is considered a good score. |
format |
Final Year Project / Dissertation / Thesis |
author |
Lee, Yong Wei |
author_facet |
Lee, Yong Wei |
author_sort |
Lee, Yong Wei |
title |
Sentence-based alignment for parallel text corpora preparation for machine translation. |
title_short |
Sentence-based alignment for parallel text corpora preparation for machine translation. |
title_full |
Sentence-based alignment for parallel text corpora preparation for machine translation. |
title_fullStr |
Sentence-based alignment for parallel text corpora preparation for machine translation. |
title_full_unstemmed |
Sentence-based alignment for parallel text corpora preparation for machine translation. |
title_sort |
sentence-based alignment for parallel text corpora preparation for machine translation. |
publishDate |
2021 |
url |
http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf http://eprints.utar.edu.my/4261/ |
_version_ |
1728055945473294336 |