Reinforcement Learning to Play an Optimal Nash Equilibrium in Coordination Markov Games Wang XiaoFeng CMU In a coordination Markov game, agents have some interests in common. However, in the presence of multiple Nash equilibria, coordination becomes highly nontrivial even when there exists Nash equilibria maximizing individual agent's profit. The problem is exacerbated if the agents do not know the game (i.e., environment) and independently receive noisy payoffs. In our research, we start investigating this problem in the simplest form: team Markov games. We propose optimal adaptive learning (OAL), the first algorithm that converges to an optimal Nash equilibrium for any team Markov game. Our approach takes a bound function to construct a virtual game that captures the game structure and eliminates strict suboptimal Nash equilibria. Over the virtual game, we implement a modified version of Young's adaptive play, called biased adaptive play, to coordinate individual agents' actions to an optimal Nash equilibrium. We prove the convergence of the algorithm and empirically evaluate its performance. Our further research extends the algorithm beyond self-play: In a two-player team Markov game, an OAL agent using proper biased rules will learn the optimal coordination policy with the agent taking other coordination strategy such as JAL (Claus and Boutilier) or IL. More interestingly, OAL also works in more general settings where agents do not receive the identical expected payoffs but have some common interests. An example is two-player minimal effort games.