Sign in

Online Learning with Off-Policy Feedback

By Germano Gabbianelli and others
We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead sees the ones obtained by another unknown policy run in parallel (behavior policy). Instead of a... Show more
July 18, 2022
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Online Learning with Off-Policy Feedback
Click on play to start listening