Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequence learning, exemplified by the task of bandit neural machine translation (NMT). We investigate the reliability of human bandit feedback, and analyze the influence of reliability on the learnability of...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Published: |
Archīum Ateneo
2018
|
Subjects: | |
Online Access: | https://archium.ateneo.edu/psychology-faculty-pubs/357 https://aclanthology.org/P18-1165/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Ateneo De Manila University |