Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning

We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequence learning, exemplified by the task of bandit neural machine translation (NMT). We investigate the reliability of human bandit feedback, and analyze the influence of reliability on the learnability of...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kreutzer, Julia, Uyheng, Joshua, Riezler, Stefan
Format:	text
Published:	Archīum Ateneo 2018
Subjects:	Cognitive Psychology Psychology
Online Access:	https://archium.ateneo.edu/psychology-faculty-pubs/357 https://aclanthology.org/P18-1165/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Ateneo De Manila University

Internet

https://archium.ateneo.edu/psychology-faculty-pubs/357
https://aclanthology.org/P18-1165/

Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning

Internet

Similar Items