Building generalizable models for discourse phenomena evaluation and machine translation

The neural revolution in machine translation has made it easier to model larger contexts beyond the sentence-level, which can potentially help resolve some discourse-level ambiguities and enable better translations. Despite increasing instances of machine translation systems including contextual...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Jwalapuram, Prathyusha
مؤلفون آخرون: Joty Shafiq Rayhan
التنسيق: Thesis-Doctor of Philosophy
اللغة:English
منشور في: Nanyang Technological University 2023
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/165027
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:The neural revolution in machine translation has made it easier to model larger contexts beyond the sentence-level, which can potentially help resolve some discourse-level ambiguities and enable better translations. Despite increasing instances of machine translation systems including contextual information, the evidence for translation quality improvement is sparse, especially for discourse phenomena. Most of these phenomena go virtually unnoticed by traditional automatic evaluation measures such as BLEU. This work presents testsets and evaluation measures for four discourse phenomena: anaphora, lexical consistency, discourse connectives, and coherence, and highlights the need for performing such fine-grained evaluation. We present benchmarking results for several context-aware machine translation models using these testsets and evaluation measures, showing that the performance is not always consistent across languages. We also present a targeted fine-tuning strategy which improves pronoun translations by leveraging errors in already seen training data and additional losses, instead of building specialized architectures that do not generalize across languages.