Building generalizable models for discourse phenomena evaluation and machine translation
The neural revolution in machine translation has made it easier to model larger contexts beyond the sentence-level, which can potentially help resolve some discourse-level ambiguities and enable better translations. Despite increasing instances of machine translation systems including contextual...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/165027 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The neural revolution in machine translation has made it easier to model larger contexts beyond
the sentence-level, which can potentially help resolve some discourse-level ambiguities and
enable better translations. Despite increasing instances of machine translation systems
including contextual information, the evidence for translation quality improvement is sparse,
especially for discourse phenomena. Most of these phenomena go virtually unnoticed by
traditional automatic evaluation measures such as BLEU. This work presents testsets and
evaluation measures for four discourse phenomena: anaphora, lexical consistency, discourse
connectives, and coherence, and highlights the need for performing such fine-grained
evaluation. We present benchmarking results for several context-aware machine translation
models using these testsets and evaluation measures, showing that the performance is not
always consistent across languages. We also present a targeted fine-tuning strategy which
improves pronoun translations by leveraging errors in already seen training data and additional
losses, instead of building specialized architectures that do not generalize across languages. |
---|