Sign in

Importance-Weighted Offline Learning Done Right

By Germano Gabbianelli and others
We study the problem of offline policy optimization in stochastic contextual bandit problems, where the goal is to learn a near-optimal policy based on a dataset of decision data collected by a suboptimal behavior policy. Rather than making any structural assumptions on the reward function, we assume access to a... Show more
September 27, 2023
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Importance-Weighted Offline Learning Done Right
Click on play to start listening