IMAGE RETRIEVAL Segmentation and clustering to develop an image vocabulary, "blobs". Several methods used to cross-correlate blobs and words based on probabilistic models. Some work better than others, although I didn't get a sense of why. Could these ideas be used in mobile robotics? Here, instead of cross-correlating words and blobs, we'd just use blobs to retrieve images with similar blobs. VISUAL LEARNING Once they have created the state and action space, would it be hard for a person to create a (near?) optimal policy? There's only about 350 states that need to be classified by their optimal action. Heck, labeling a few examples by hand then using supervised learning might work well. The phrase "avoid the local maxima of the action-value function Q" is interesting. I don't think it has local maxima in the standard way, but they are making a connection to the work on shaping rewards. The "state action deviation problem" sounds like they want a semi-Markov model. The LEM stuff is also related to shaping rewards and there are lots of examples of people using this approach in RL. It is now know how to achieve polynomial convergence time in general, so the LEM argument is a bit weaker than it could be. How was the reward function computed? Human observers?