Reinforcement Learning to Play an Optimal Nash Equilibrium in
Coordination Markov Games
Wang XiaoFeng
CMU
In a coordination Markov game, agents have some interests in common.
However, in the presence of multiple Nash equilibria, coordination
becomes highly nontrivial even when there exists Nash equilibria
maximizing individual agent's profit. The problem is exacerbated if
the agents do not know the game (i.e., environment) and independently
receive noisy payoffs. In our research, we start investigating this
problem in the simplest form: team Markov games. We propose optimal
adaptive learning (OAL), the first algorithm that converges to an
optimal Nash equilibrium for any team Markov game. Our approach takes
a bound function to construct a virtual game that captures the game
structure and eliminates strict suboptimal Nash equilibria. Over the
virtual game, we implement a modified version of Young's adaptive
play, called biased adaptive play, to coordinate individual agents'
actions to an optimal Nash equilibrium. We prove the convergence of
the algorithm and empirically evaluate its performance. Our further
research extends the algorithm beyond self-play: In a two-player team
Markov game, an OAL agent using proper biased rules will learn the
optimal coordination policy with the agent taking other coordination
strategy such as JAL (Claus and Boutilier) or IL. More interestingly,
OAL also works in more general settings where agents do not receive
the identical expected payoffs but have some common interests. An
example is two-player minimal effort games.