POMDP information page
A POMDP is a partially observable Markov decision process. It is a
model, originating in the operations research (OR) literature, for
describing planning tasks in which the decision maker does not have
complete information as to its current state. The POMDP model
provides a convenient way of reasoning about tradeoffs between actions
to gain reward and actions to gain information.
Michael Littman, Tony Cassandra, and Leslie Kaelbling have been
studying this model from an artificial intelligence (AI) perspective.
This page provides pointers to their work and also other relevant work
in this area. I highly recommend Tony's current (circa Feb 2004)
- POMDPs as a model of
planning: The paper describes the POMDP model to an AI audience
and includes a discussion of how some infinite-horizon policies can be
Anthony R. Cassandra, Leslie Pack Kaelbling, and Michael L. Littman.
Acting optimally in partially observable stochastic domains. In
Proceedings of the Twelfth National Conference on Artificial
Intelligence, Seattle, WA, 1994.
- POMDP AI: A longer,
journal-submitted version. Provides a complete description of the
witness algorithm for solving POMDPs and discusses connections to
earlier work in AI.
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra.
Planning and acting in partially observable stochastic domains.
Submitted to Artificial Intelligence.
- POMDP algorithms: Describes a
new set of algorithms that are faster and simpler than witness. This
paper is not written for novices, but we highly recommend it for
anyone contemplating implementing a POMDP algorithm.
Anthony Cassandra, Michael L. Littman, and Nevin L. Zhang.
Incremental pruning: A simple, fast, exact algorithm for partially
observable Markov decision processes. To appear in Proceedings of
the Thirteenth Annual Conference on Uncertainty in Artificial
Intelligence (UAI--97), 1997.
- POMDP theory paper (abstract): A mathematically complete
discussion of the witness algorithm. Includes empirical comparisons
with other POMDP solution algorithms and a result showing that
dynamic-programming updates in POMDPs are NP-complete under randomized
Michael L. Littman, Anthony R. Cassandra, and Leslie Pack Kaelbling.
Efficient dynamic-programming updates in partially observable Markov
decision processes. Available as Brown University Technical Report
- Witness algorithm tech report:
Earlier Brown University tech report describing our algorithm for
Michael L. Littman. The witness algorithm: Solving partially
observable Markov decision processes. Technical Report CS-94-40,
Brown University, Department of Computer Science, Providence, RI,
approximation for POMDPs: Proceedings of a workshop on
value-function approximation. Includes slides from Michael Littman's talk summarizing the
use of learning techniques to approximate optimal value functions for
Michael Littman. Solving partially observable Markov decision
processes via VFA. In Justin A. Boyan, Andrew W. Moore, and Richard
S. Sutton, editors, Proceedings of the Workshop on Value Function
Approximation, Machine Learning Conference 1995, Technical report
- POMDP policy learning: The paper
presents results on using reinforcement-learning techniques to find
good policies for large (approximately 100-state) POMDPs.
Michael Littman, Anthony Cassandra, and Leslie Kaelbling. Learning
policies for partially observable environments: Scaling up. In Armand
Prieditis and Stuart Russell, editors, Proceedings of the Twelfth
International Conference on Machine Learning, pages 362--370, San
Francisco, CA, 1995. Morgan Kaufmann.
- POMDP policy learning:
Presents a similar technique for learning POMDP policies online. They
explore non-linear approximations to the optimal value function.
Ronald Parr and Stuart Russell. Approximating optimal policies for
partially observable stochastic domains. In Proceedings of the
International Joint Conference on Artificial Intelligence, 1995.
Lonnie Chrisman's POMDP-learning homepage: used to include pointers to
articles and web references following up on his 1992 AAAI paper on using the
Baum-Welch algorithm to infer POMDPs from experience.
- hidden state learning
bibliography: a list of papers relevant to the exploration of
POMDP models in the reinforcement learning literature. The list was
compiled by Lonnie Chrisman and Michael Littman as part of their
presentation on hidden state in the reinforcement learning workshop at
the Machine Learning conference in 1993.
- Markov model talk: The
slides for a talk by Michael Littman on the use of Markov models
(MDPs, POMDPs, and Markov games) in AI. The talk was presented at the
University of Pennsylvania, Brandeis University, and the University at
Stony Brook during the spring of 1995.
- Sequential decision-making
talk: The slides for a talk by Michael Littman on algorithms for
finding optimal policies under uncertainty. The talk was presented at
Brown University, the University of Delaware, the University of
Pittsburgh, Duke University, and the University of Pennsylvania during
the spring of 1996.
- Memoryless policies: Argues that
finding optimal policies that ignore history is NP-complete. Gives
positive results of the use of a heuristic method on a few examples
from the literature.
Michael L. Littman. Memoryless policies: Theoretical limitations and
practical results. In Dave Cliff, Philip Husbands, Jean-Arcady Meyer,
and Stewart W. Wilson, editors, From Animals to Animats 3:
Proceedings of the Third International Conference on Simulation of
Adaptive Behavior, Cambridge, MA, 1994. MIT Press.
Cassandra's links: Additional information including a proposed
standard notation for expressing POMDP problems.
for Dummies: Tony's tutorial to exact POMDP algorithms.
- Modelisation stochastique pour la robotique. Pierre LAROCHE.
September 1995. Description of the witness algorithm in French. (postscript). POMDP
in French: Les Processus de Decision Markoviens Partiellement.
For more information, contact Michael Littman: email@example.com.
Last update: Mon Feb 9 22:11:10 EST 2004.