Sign in

Whittle index based Q-learning for restless bandits with average reward

By Konstantin Avrachenkov and Vivek Borkar
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported... Show more
March 9, 2021
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Whittle index based Q-learning for restless bandits with average reward
Click on play to start listening