Bilingual sentence alignment based on sentence length and word translation

Sentence alignment plays an important role in machine translation.It is an essential task inprocessingparallel corporawhich are ample andsubstantial resourcesfor natural language processing. In order to apply these abundant materials into useful applications, parallel corporafirst have to be align...

Full description

Saved in:

Bibliographic Details
Main Author:	Triệu, Hải Long
Other Authors:	Nguyễn, Phương Thái
Language:	English
Published:	ĐHCN 2017
Online Access:	http://repository.vnu.edu.vn/handle/VNU_123/43268
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Vietnam National University, Hanoi
Language:	English

id	oai:112.137.131.14:VNU_123-43268
record_format	dspace
spelling	oai:112.137.131.14:VNU_123-432682018-07-26T07:45:45Z Bilingual sentence alignment based on sentence length and word translation Triệu, Hải Long Nguyễn, Phương Thái Sentence alignment plays an important role in machine translation.It is an essential task inprocessingparallel corporawhich are ample andsubstantial resourcesfor natural language processing. In order to apply these abundant materials into useful applications, parallel corporafirst have to be aligned at the sentence level.This process maps sentences in textsof source language to their corresponding units in textsof target language. Parallel corporaaligned at sentence levelbecome a useful resource for a number of applications innatural language processing including Statistical Machine Translation, word disambiguation, cross language information retrieval. This task also helps to extract structural information and derive statistical parameters from bilingual corpora.There have beena number of algorithms proposed with different approachesfor sentence alignment. However, they may be classified into some major categories. First of all, there are methods based on the similarity of sentence lengths which can be measured by words or characters of sentences. Thesemethods are simple but effective to apply for language pairs that have a high similarity in sentence lengths. The secondset ofmethods isbased on word correspondences or lexicon. These methods take into account the lexical information about texts, whichisbased on matching content in texts orusescognates. An external dictionary may be used in these methods, so these methods are more accuratebut slower than the first ones. There are also methods based on the hybridsof these first two approachesthatcombine their advantages, so they obtain quite high quality of alignments.In this thesis, I summarizegeneral issues related to sentence alignment, and I evaluate approaches proposed for this task and focus on thehybridmethod, especially the proposalof Moore(2002), an effective method with high performance in term of precision. From analyzing the limits of this method, I propose an algorithm usinga new feature, bilingual word clustering,to improve the quality of Moore‟s method.The baseline method (Moore, 2002) will be introducedbased on analyzing of the framework, and I describe advantages as well as weaknesses of this approach.In addition to this, I describe the basis knowledge, algorithmof bilingual word clustering, and the new featureusedin sentence alignment.Finally, experiments performed in this research are illustrated as well as evaluations to prove benefits of the proposed method. 2017-05-17T08:20:22Z 2017-05-17T08:20:22Z 2014 Triệu, H. L. (2014). Bilingual sentence alignment based on sentence length and word translation. Master's thesis, Vietnam National University, Hanoi 00051000190 http://repository.vnu.edu.vn/handle/VNU_123/43268 en Luận văn Ngành Khoa học Máy tính (Full) 61 p. + CD-ROM + Tóm tắt application/pdf ĐHCN
institution	Vietnam National University, Hanoi
building	VNU Library & Information Center
country	Vietnam
collection	VNU Digital Repository
language	English
description	Sentence alignment plays an important role in machine translation.It is an essential task inprocessingparallel corporawhich are ample andsubstantial resourcesfor natural language processing. In order to apply these abundant materials into useful applications, parallel corporafirst have to be aligned at the sentence level.This process maps sentences in textsof source language to their corresponding units in textsof target language. Parallel corporaaligned at sentence levelbecome a useful resource for a number of applications innatural language processing including Statistical Machine Translation, word disambiguation, cross language information retrieval. This task also helps to extract structural information and derive statistical parameters from bilingual corpora.There have beena number of algorithms proposed with different approachesfor sentence alignment. However, they may be classified into some major categories. First of all, there are methods based on the similarity of sentence lengths which can be measured by words or characters of sentences. Thesemethods are simple but effective to apply for language pairs that have a high similarity in sentence lengths. The secondset ofmethods isbased on word correspondences or lexicon. These methods take into account the lexical information about texts, whichisbased on matching content in texts orusescognates. An external dictionary may be used in these methods, so these methods are more accuratebut slower than the first ones. There are also methods based on the hybridsof these first two approachesthatcombine their advantages, so they obtain quite high quality of alignments.In this thesis, I summarizegeneral issues related to sentence alignment, and I evaluate approaches proposed for this task and focus on thehybridmethod, especially the proposalof Moore(2002), an effective method with high performance in term of precision. From analyzing the limits of this method, I propose an algorithm usinga new feature, bilingual word clustering,to improve the quality of Moore‟s method.The baseline method (Moore, 2002) will be introducedbased on analyzing of the framework, and I describe advantages as well as weaknesses of this approach.In addition to this, I describe the basis knowledge, algorithmof bilingual word clustering, and the new featureusedin sentence alignment.Finally, experiments performed in this research are illustrated as well as evaluations to prove benefits of the proposed method.
author2	Nguyễn, Phương Thái
author_facet	Nguyễn, Phương Thái Triệu, Hải Long
author	Triệu, Hải Long
spellingShingle	Triệu, Hải Long Bilingual sentence alignment based on sentence length and word translation
author_sort	Triệu, Hải Long
title	Bilingual sentence alignment based on sentence length and word translation
title_short	Bilingual sentence alignment based on sentence length and word translation
title_full	Bilingual sentence alignment based on sentence length and word translation
title_fullStr	Bilingual sentence alignment based on sentence length and word translation
title_full_unstemmed	Bilingual sentence alignment based on sentence length and word translation
title_sort	bilingual sentence alignment based on sentence length and word translation
publisher	ĐHCN
publishDate	2017
url	http://repository.vnu.edu.vn/handle/VNU_123/43268
_version_	1680966653602430976

Bilingual sentence alignment based on sentence length and word translation

Similar Items