Generalizing Multiagent Plans to New Environments in Relational MDPs
Carlos Guestrin, with Daphne Koller
Stanford University
A longstanding goal in planning research is the ability to generalize
a plan developed for some set of environments to a new but similar
environment, without having to replan. However, it is not clear, in
general, how a plan developed for one environment can be translated to
apply in another. In this talk, we present an approach to the
generalization problem based on a new framework of relational Markov
Decision Processes (RMDPs). An RMDP models the world as containing
objects of different classes. The process dynamics and rewards are
represented at the class level, and can be applied to environments
containing different sets of objects related to each other in various
ways. An object may be associated with actions, in which case it
becomes an active agent in the environment. Thus, an RMDP can model a
range of multiagent planning problems, where classes represent sets of
agents with similar abilities. We define a class-based approximate
value function that is specified in terms of classes of objects, and
can therefore be applied to multiple environments. We provide an
optimality criterion measuring the quality of a class-based value
function for an entire set of environments, and show how to
approximately optimize such a value function by using a linear
programming method combined with a sampling process over
environments. We then prove that a polynomial number of samples are
sufficient to approximate the entire space of environments. Finally,
we present a simple learning procedure for discovering classes of
objects or agents. Our experimental results show that our class-based
value function can generalize successfully to new multiagent planning
problems and that our class learning procedure improves the quality of
our approximation.