Cải tiến chất lượng hệ dịch máy thống kê bằng cách sử dụng kho ngữ liệu đơn ngữ trong ngôn ngữ nguồn

Nowadays, statistical machine translation is derived diverse interest of researchers thanks to its advantages. However, approaches based on statistic constantly confront deficiencies of parallel and specific domain corpora. Generating these corpora re-quires intensive human effo...

Full description

Saved in:
Bibliographic Details
Main Author: Vũ, Huy Hiển
Other Authors: Nguyễn, Phương Thái
Format: Theses and Dissertations
Published: ĐHCN 2017
Subjects:
Online Access:http://repository.vnu.edu.vn/handle/VNU_123/43272
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Vietnam National University, Hanoi
Description
Summary:Nowadays, statistical machine translation is derived diverse interest of researchers thanks to its advantages. However, approaches based on statistic constantly confront deficiencies of parallel and specific domain corpora. Generating these corpora re-quires intensive human effort and availability of experts. Unfortunately, only a few popular languages in the world are derived continuous financial support and interest of researchers for development of machine translation systems. For most remaining languages, there is very small interest of funding available. Therefore it becomes an immense obstacle to apply approaches based on statistic for such languages. The purpose of this thesis is to propose a method for utilizing unannotated corpora to address this impediment