Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning

We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequence learning, exemplified by the task of bandit neural machine translation (NMT). We investigate the reliability of human bandit feedback, and analyze the influence of reliability on the learnability of...

Full description

Saved in:
Bibliographic Details
Main Authors: Kreutzer, Julia, Uyheng, Joshua, Riezler, Stefan
Format: text
Published: Archīum Ateneo 2018
Subjects:
Online Access:https://archium.ateneo.edu/psychology-faculty-pubs/357
https://aclanthology.org/P18-1165/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University