An optimization-based categorization of reinforcement learning environments Michael L. Littman Bellcore / Carnegie Mellon University mlittman@cs.cmu.edu Abstract This paper proposes a categorization of reinforcement learning environments based on the optimization of a reinforcement signal over time. Environments are classified by the simplest agent that can possibly achieve optimal reinforcement. Two parameters, h and beta, abstractly characterize the complexity of an agent: the ideal (h,beta)-agent uses the input information provided by the environment and at most h bits of local storage to choose an action that maximizes the discounted sum of the next beta reinforcements. In an (h,beta)-environment, an ideal (h,beta)-agent achieves the maximum possible expected reinforcement for that environment. The paper discusses the special cases when either h=0 or beta=1 in detail, describes some theoretical bounds on h and beta and reexplores a well-known reinforcement learning environment with this new notation.