Multiple timescales for multiagent learning
David Leslie
University of Bristol
We consider versions of multiagent Q-learning in which each player
learns on a different timescale, i.e. the learning parameters of
different participants decrease to zero at different rates. We
demonstrate that this is useful both as a theoretical tool, using the
theory of multiple-timescales stochastic approximation, and also as a
practical aid to convergence - there are examples for which naive
multiagent Q-learning fails to converge but multiple-timescales
Q-learning converges to Nash distribution almost surely.