By Chen-Yu Wei and others

We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we first propose a computationally inefficient algorithm with optimal $\widetilde{O}(\sqrt{T})$ regret and another computationally efficient variant with $\widetilde{O}(T^{3/4})$... Show more

April 26, 2021

Loading full text...

Similar articles

Loading recommendations...

x1

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

Click on play to start listening