A Generalized Reinforcement-Learning Model: Convergence and Applications Michael L. Littman(1) and Csaba Szepesva'ri(2) (1) Department of Computer Science Brown University Providence, RI 02912-1910 USA mlittman@cs.brown.edu (2) Bolyai Institute of Mathematics "Jozsef Attila" University of Szeged Szeged 6720 Aradivrttere 1. HUNGARY szepes@math.u-szeged.hu January 18, 1996 Abstract Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (MDP) model is a popular way of formalizing the reinforcement-learning problem, but it is by no means the only way. In this paper, we show how many of the important theoretical results concerning reinforcement learning in MDPs extend to a generalized MDP model that includes MDPs, two-player games and MDPs under a worst-case optimality criterion as special cases. The basis of this extension is a stochastic-approximation theorem that reduces asynchronous convergence to synchronous convergence. Keywords: Reinforcement learning, Q-learning convergence, Markov games