On the Risks and Rewards of Coordination in Multiagent Reinforcement Learning Craig Boutilier University of Toronto Much emphasis in multiagent reinforcement learning (MARL) research is placed on ensuring that MARL algorithms (eventually) converge to equilibria. As in standard reinforcement learning, convergence generally requires sufficient exploration of strategy space. However, exploration often comes at a price in the form of penalties or foregone opportunities. In multiagent settings, the problem is exacerbated by the need for agents to "coordinate" their policies on equilibria, and the fact that some equilibrium points are more attractive than others. We propose a Bayesian model for optimal exploration in MARL problems that allows these exploration costs to be weighed against their expected benefits using the notion of value of information. Unlike standard RL models, this model requires reasoning about how one's actions will influence the behavior of other agents. We develop tractable approximations to optimal Bayesian exploration, and report on preliminary experiments illustrating the benefits of this approach. This describes joint work with Georgios Chalkiadakis.