Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequence learning, exemplified by the task of bandit neural machine translation (NMT). We investigate the reliability of human bandit feedback, and analyze the influence of reliability on the learnability of...
Saved in:
Main Authors: | Kreutzer, Julia, Uyheng, Joshua, Riezler, Stefan |
---|---|
Format: | text |
Published: |
Archīum Ateneo
2018
|
Subjects: | |
Online Access: | https://archium.ateneo.edu/psychology-faculty-pubs/357 https://aclanthology.org/P18-1165/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Ateneo De Manila University |
Similar Items
-
Is there evidence for cross-domain congruency sequence effect? A replication of Kan et al. (2013)
by: ACZEL, Balazs, et al.
Published: (2021) -
Leadership and Human Values
by: DEKLE, Dawn Jeanine
Published: (2003) -
Beyond the Special Case: Applying Neural Theories of Consciousness to Non-Human Animals
by: FARBER, Ilya
Published: (2007) -
Variational learning from implicit bandit feedback
by: TRUONG, Quoc Tuan, et al.
Published: (2021) -
Cognitive Polyphasia in a Global South Populist Democracy: Mapping Social Representations of Duterte's Regime in the Philippines
by: Montiel, Cristina Jayme, et al.
Published: (2020)