Sign in

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

By Chen-Yu Wei and others
We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we first propose a computationally inefficient algorithm with optimal O~(T)\widetilde{O}(\sqrt{T}) regret and another computationally efficient variant with O~(T3/4)\widetilde{O}(T^{3/4})... Show more
April 26, 2021
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Click on play to start listening