Neural machine translation for discourse phenomena

In recent years, Neural Machine Translation (NMT) has received increasing interest in the natural language processing field (NLP) and has achieved the state-of-the-art on numerous tasks. In this final year project, we discover how NMT models can be adapted to handle discourse phenomena in machine t...

全面介紹

Saved in:
書目詳細資料
主要作者: Shen, Youlin
其他作者: Joty Shafiq Rayhan
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2020
主題:
在線閱讀:https://hdl.handle.net/10356/144540
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:In recent years, Neural Machine Translation (NMT) has received increasing interest in the natural language processing field (NLP) and has achieved the state-of-the-art on numerous tasks. In this final year project, we discover how NMT models can be adapted to handle discourse phenomena in machine translation and how to properly evaluate a model’s performance. Current NMT models are mainly sentence-level systems, and the performance has been improved to even reach human parity [2]. However, when evaluating the model outputs together at the document-level rather than individual sentences, humans show a strong preference for professional-translated text over machine translation [3]. This fact indicates that sentence-level NMT systems may be able to produce good translation on isolated sentences, but when put into context, these individual translations can contradict with each other. In other words, the sentence-level models perform badly on maintaining discourse phenomena. In this project, we apply Docrepair [1], a context-aware NMT model to tackle the dis- course phenomena problem. Docrepair is a monolingual document-level model for correcting the ’translation-ese’ produced in the process, i.e. Docrepair performs automatic post-editing on a sequence of sentences and refines the overall translation concerning each other as context. The model aims to map inconsistent groups of sentences into more a natural one from a human point of view. Furthermore, the evaluation criteria used for most NMT models are standard automatic metrics like BLEU scores. These metrics are poorly adapted to evaluate models’ performance on discourse phenomena. To give researchers in this field more evidence on the quality of the system output, Jwala et. al. proposed a comprehensive benchmark framework for evaluating discourse phenomena [4]. The benchmarking framework checks 4 discourse phenomena, namely Anaphora, Lexical Consistency, Coherence, and Readability. In this project, we build an online application, DiscourseGym, available to all the researchers in NLP field to utilize the benchmark framework. The application features testset downloads with various filters and options, automatic model output evaluation, enhanced visualizations to give the user more information on their model performance. Besides, the DiscourseGym also features a leaderboard for researchers to compare their model performances and allows community contribution by user-driven testset.