MDP
A Markov decision process (MDP) has:
A finite set of states
S
.
A finite set of actions
A
.
A transition function
T
(
s
,
a
,
s
') for all
,
(
).
A reward function
R
(
s
,
a
) for all
,
.
Next:
Example
Up:
VALUE ITERATION
Previous:
VALUE ITERATION