Multiple timescales for multiagent learning David Leslie University of Bristol We consider versions of multiagent Q-learning in which each player learns on a different timescale, i.e. the learning parameters of different participants decrease to zero at different rates. We demonstrate that this is useful both as a theoretical tool, using the theory of multiple-timescales stochastic approximation, and also as a practical aid to convergence - there are examples for which naive multiagent Q-learning fails to converge but multiple-timescales Q-learning converges to Nash distribution almost surely.