On the Risks and Rewards of Coordination in Multiagent
Reinforcement Learning
Craig Boutilier
University of Toronto
Much emphasis in multiagent reinforcement learning (MARL) research is
placed on ensuring that MARL algorithms (eventually) converge to
equilibria. As in standard reinforcement learning, convergence
generally requires sufficient exploration of strategy space. However,
exploration often comes at a price in the form of penalties or
foregone opportunities. In multiagent settings, the problem is
exacerbated by the need for agents to "coordinate" their policies on
equilibria, and the fact that some equilibrium points are more
attractive than others. We propose a Bayesian model for optimal
exploration in MARL problems that allows these exploration costs to be
weighed against their expected benefits using the notion of value of
information. Unlike standard RL models, this model requires reasoning
about how one's actions will influence the behavior of other agents.
We develop tractable approximations to optimal Bayesian exploration,
and report on preliminary experiments illustrating the benefits of
this approach.
This describes joint work with Georgios Chalkiadakis.