Neural machine translation for discourse phenomena

In recent years, Neural Machine Translation (NMT) has received increasing interest in the natural language processing field (NLP) and has achieved the state-of-the-art on numerous tasks. In this final year project, we discover how NMT models can be adapted to handle discourse phenomena in machine t...

Full description

Saved in:
Bibliographic Details
Main Author: Shen, Youlin
Other Authors: Joty Shafiq Rayhan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/144540
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In recent years, Neural Machine Translation (NMT) has received increasing interest in the natural language processing field (NLP) and has achieved the state-of-the-art on numerous tasks. In this final year project, we discover how NMT models can be adapted to handle discourse phenomena in machine translation and how to properly evaluate a model’s performance. Current NMT models are mainly sentence-level systems, and the performance has been improved to even reach human parity [2]. However, when evaluating the model outputs together at the document-level rather than individual sentences, humans show a strong preference for professional-translated text over machine translation [3]. This fact indicates that sentence-level NMT systems may be able to produce good translation on isolated sentences, but when put into context, these individual translations can contradict with each other. In other words, the sentence-level models perform badly on maintaining discourse phenomena. In this project, we apply Docrepair [1], a context-aware NMT model to tackle the dis- course phenomena problem. Docrepair is a monolingual document-level model for correcting the ’translation-ese’ produced in the process, i.e. Docrepair performs automatic post-editing on a sequence of sentences and refines the overall translation concerning each other as context. The model aims to map inconsistent groups of sentences into more a natural one from a human point of view. Furthermore, the evaluation criteria used for most NMT models are standard automatic metrics like BLEU scores. These metrics are poorly adapted to evaluate models’ performance on discourse phenomena. To give researchers in this field more evidence on the quality of the system output, Jwala et. al. proposed a comprehensive benchmark framework for evaluating discourse phenomena [4]. The benchmarking framework checks 4 discourse phenomena, namely Anaphora, Lexical Consistency, Coherence, and Readability. In this project, we build an online application, DiscourseGym, available to all the researchers in NLP field to utilize the benchmark framework. The application features testset downloads with various filters and options, automatic model output evaluation, enhanced visualizations to give the user more information on their model performance. Besides, the DiscourseGym also features a leaderboard for researchers to compare their model performances and allows community contribution by user-driven testset.