Bilingual sentence alignment based on sentence length and word translation
Sentence alignment plays an important role in machine translation.It is an essential task inprocessingparallel corporawhich are ample andsubstantial resourcesfor natural language processing. In order to apply these abundant materials into useful applications, parallel corporafirst have to be align...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Language: | English |
Published: |
ĐHCN
2017
|
Online Access: | http://repository.vnu.edu.vn/handle/VNU_123/43268 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Vietnam National University, Hanoi |
Language: | English |
id |
oai:112.137.131.14:VNU_123-43268 |
---|---|
record_format |
dspace |
spelling |
oai:112.137.131.14:VNU_123-432682018-07-26T07:45:45Z Bilingual sentence alignment based on sentence length and word translation Triệu, Hải Long Nguyễn, Phương Thái Sentence alignment plays an important role in machine translation.It is an essential task inprocessingparallel corporawhich are ample andsubstantial resourcesfor natural language processing. In order to apply these abundant materials into useful applications, parallel corporafirst have to be aligned at the sentence level.This process maps sentences in textsof source language to their corresponding units in textsof target language. Parallel corporaaligned at sentence levelbecome a useful resource for a number of applications innatural language processing including Statistical Machine Translation, word disambiguation, cross language information retrieval. This task also helps to extract structural information and derive statistical parameters from bilingual corpora.There have beena number of algorithms proposed with different approachesfor sentence alignment. However, they may be classified into some major categories. First of all, there are methods based on the similarity of sentence lengths which can be measured by words or characters of sentences. Thesemethods are simple but effective to apply for language pairs that have a high similarity in sentence lengths. The secondset ofmethods isbased on word correspondences or lexicon. These methods take into account the lexical information about texts, whichisbased on matching content in texts orusescognates. An external dictionary may be used in these methods, so these methods are more accuratebut slower than the first ones. There are also methods based on the hybridsof these first two approachesthatcombine their advantages, so they obtain quite high quality of alignments.In this thesis, I summarizegeneral issues related to sentence alignment, and I evaluate approaches proposed for this task and focus on thehybridmethod, especially the proposalof Moore(2002), an effective method with high performance in term of precision. From analyzing the limits of this method, I propose an algorithm usinga new feature, bilingual word clustering,to improve the quality of Moore‟s method.The baseline method (Moore, 2002) will be introducedbased on analyzing of the framework, and I describe advantages as well as weaknesses of this approach.In addition to this, I describe the basis knowledge, algorithmof bilingual word clustering, and the new featureusedin sentence alignment.Finally, experiments performed in this research are illustrated as well as evaluations to prove benefits of the proposed method. 2017-05-17T08:20:22Z 2017-05-17T08:20:22Z 2014 Triệu, H. L. (2014). Bilingual sentence alignment based on sentence length and word translation. Master's thesis, Vietnam National University, Hanoi 00051000190 http://repository.vnu.edu.vn/handle/VNU_123/43268 en Luận văn Ngành Khoa học Máy tính (Full) 61 p. + CD-ROM + Tóm tắt application/pdf ĐHCN |
institution |
Vietnam National University, Hanoi |
building |
VNU Library & Information Center |
country |
Vietnam |
collection |
VNU Digital Repository |
language |
English |
description |
Sentence alignment plays an important role in machine translation.It is an essential task inprocessingparallel corporawhich are ample andsubstantial resourcesfor natural language processing. In order to apply these abundant materials into useful applications, parallel corporafirst have to be aligned at the sentence level.This process maps sentences in textsof source language to their corresponding units in textsof target language. Parallel corporaaligned at sentence levelbecome a useful resource for a number of applications innatural language processing including Statistical Machine Translation, word disambiguation, cross language information retrieval. This task also helps to extract structural information and derive statistical parameters from bilingual corpora.There have beena number of algorithms proposed with different approachesfor sentence alignment. However, they may be classified into some major categories. First of all, there are methods based on the similarity of sentence lengths which can be measured by words or characters of sentences. Thesemethods are simple but effective to apply for language pairs that have a high similarity in sentence lengths. The secondset ofmethods isbased on word correspondences or lexicon. These methods take into account the lexical information about texts, whichisbased on matching content in texts orusescognates. An external dictionary may be used in these methods, so these methods are more accuratebut slower than the first ones. There are also methods based on the hybridsof these first two approachesthatcombine their advantages, so they obtain quite high quality of alignments.In this thesis, I summarizegeneral issues related to sentence alignment, and I evaluate approaches proposed for this task and focus on thehybridmethod, especially the proposalof Moore(2002), an effective method with high performance in term of precision. From analyzing the limits of this method, I propose an algorithm usinga new feature, bilingual word clustering,to improve the quality of Moore‟s method.The baseline method (Moore, 2002) will be introducedbased on analyzing of the framework, and I describe advantages as well as weaknesses of this approach.In addition to this, I describe the basis knowledge, algorithmof bilingual word clustering, and the new featureusedin sentence alignment.Finally, experiments performed in this research are illustrated as well as evaluations to prove benefits of the proposed method. |
author2 |
Nguyễn, Phương Thái |
author_facet |
Nguyễn, Phương Thái Triệu, Hải Long |
author |
Triệu, Hải Long |
spellingShingle |
Triệu, Hải Long Bilingual sentence alignment based on sentence length and word translation |
author_sort |
Triệu, Hải Long |
title |
Bilingual sentence alignment based on sentence length and word translation |
title_short |
Bilingual sentence alignment based on sentence length and word translation |
title_full |
Bilingual sentence alignment based on sentence length and word translation |
title_fullStr |
Bilingual sentence alignment based on sentence length and word translation |
title_full_unstemmed |
Bilingual sentence alignment based on sentence length and word translation |
title_sort |
bilingual sentence alignment based on sentence length and word translation |
publisher |
ĐHCN |
publishDate |
2017 |
url |
http://repository.vnu.edu.vn/handle/VNU_123/43268 |
_version_ |
1680966653602430976 |