Sign in

Offline RL via Feature-Occupancy Gradient Ascent

By Gergely Neu and Nneka Okolo
We study offline Reinforcement Learning in large infinite-horizon discounted Markov Decision Processes (MDPs) when the reward and transition models are linearly realizable under a known feature map. Starting from the classic linear-program formulation of the optimal control problem in MDPs, we develop a new algorithm that performs a form of... Show more
May 22, 2024
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Offline RL via Feature-Occupancy Gradient Ascent
Click on play to start listening