Enhancing the quality of Machine Translation System Using Cross-Lingual Word Embedding Models = Nâng cao chất lượng của hệ thống dịch máy dựa trên các mô hình vector nhúng biểu diễn từ giữa hai ngôn ngữ. Luận văn ThS. Máy tính: 84801

In recent years, Machine Translation has shown promising results and received much interest of researchers. Two approaches that have been widely used for machine translation are Phrase-based Statistical Machine Translation (PBSMT) and Neural Machine Translation (NMT). During translation, both appro...

全面介紹

Saved in:
書目詳細資料
主要作者: Nguyễn, Minh Thuận, d1993-
其他作者: Nguyễn, Phương Thái
格式: Theses and Dissertations
語言:English
出版: 2019
主題:
6.3
在線閱讀:http://repository.vnu.edu.vn/handle/VNU_123/65766
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Vietnam National University, Hanoi
語言: English
實物特徵
總結:In recent years, Machine Translation has shown promising results and received much interest of researchers. Two approaches that have been widely used for machine translation are Phrase-based Statistical Machine Translation (PBSMT) and Neural Machine Translation (NMT). During translation, both approaches rely heavily on large amounts of bilingual corpora which require much effort and financial support. The lack of bilingual data leads to a poor phrase-table, which is one of the main components of PBSMT, and the unknown word problem in NMT. In contrast, monolingual data are available for most of the languages. Thanks to the advantage, many models of word embedding and cross-lingual word embedding have been appeared to improve the quality of various tasks in natural language processing. The purpose of this thesis is to propose two models for using cross-lingual word embedding models to address the above impediment. The first model enhances the quality of the phrase-table in SMT, and the remaining model tackles the unknown word problem in NMT. Publications: ? Minh-Thuan Nguyen, Van-Tan Bui, Huy-Hien Vu, Phuong-Thai Nguyen and Chi-Mai Luong. Enhancing the quality of Phrase-table in Statistical Machine Translation for Less-Common and Low-Resource Languages. In the 2018 International Conference on Asian Language Processing (IALP 2018).