Acting Optimally... -> Alex * What does the POMDP model have that the MDP does not? * Why are "actions to gain information" meaningless in MDPs? * Explain belief state updates. * Why do you think POMDPs are Markov in the belief state? * Imagine trying to formalize a robot in a maze as a POMDP. Now, picture a camera on the robot. Does this make it easier/impossible/difficult to model as a POMDP? * What is a convenient way of representing a piecewise linear, convex function? * How is a policy graph extracted from a set of alpha vectors? * Which has more states, the environmental model or the optimal policy graph for the same environment? * Consider the following policy for the tiger problem: Listen 5 times, then open whichever door the tiger has been heard behind the least. Compared to the optimal policy, describe situations in which this policy listens too long and not long enough. Acting under Uncertainty... -> Ilan * Why is it easy for a POMDP to represent being confused as to which corner of a building it is in? Why would this be hard for a unimodal distribution? * How is the POMDP framework applied to robot navigation? What are the actions, observations, states, rewards? What's up with using an action like "declare goal", which has no reality to the robot? * Would it be a "fairly simple extension" to learn the action model from experiences? How would you do it? * In what ways is the POMDP the authors defined more succinctly representable than the standard format? * The authors use a local "occupancy grid" to combine sensor readings to get a more accurate picture of the local surroundings? Why didn't they just make this part of the POMDP model? * What do the human designers bring to the task to make things work? What is the analog of "feature engineering"? * How does the belief replanning method differ from assuming the environment is deterministic? * Why would MLS do better than QMDP? (I would not have thought it would do better.) Would roll outs help? * The real world is clearly not deterministic, but in what ways was it *more* deterministic than the probabilistic model? * Why does opening and closing doors present a challenge for POMDP modeling? What does this imply for its use in robotic soccer?