Home
News
People
Projects
Publications
Conferences
Directions
Funding
Photos

 







Lab Projects

Robotics and Control

We explore the use of machine-learning techniques to guide and control robots toward completion of specific real tasks. For example, how would you get a robot to efficiently search a room for an object and, once it has located the object, to be able to find it again when the robot is moved to a new location? Another example is the control of Sony AIBO robots in the RoboCup Soccer competition. (LeRoux, Littman).

Game-Theoretic Learning

This research involves the use of reinforcement learning to develop effective stratagies in multiagent settings. This includes the development of online multiplicative update methods for large and structured games, modeling cooperation in learning, applications of game theory to the analysis of machine-learning methods, learning for games that change over time, and the relationship between game theory and reinforcement learning. (Kearns, Littman, Schapire, Warmuth).

Exploration Techniques

One of the distinguishing features of reinforcement learning is the control the learner has over its own inputs. During learning, the decision maker must often choose between decisions that appear to be best at achieving high reward and those that may lead it to parts of the state space where even higher reward can be achieved.  Choosing wisely can dramatically decrease the time needed to learn to behave effectively in a new environment.  We are testing and constructing new learning algorithms to improve exploration. (LeRoux, Littman, Strehl).

Learning to Filter

Adaptive filtering is a task of selecting interesting or relevant objects (such as documents, images, etc.) from a stream of such objects. Selection of a wrong object is penalized by some cost, while selecting a relevant object is rewarded. Frequently in such situations training data is scarse or unavailable. The filtering problem is a challenging decision problem under uncertainty, and promising approaches must evaluate incoming objects based not only on their estimated relevance to the goal, but also based on their estimated value to learning a good model of the relevant objects. We are formulating and evaluating learning approaches to modeling and solving problems in adaptive text filtering. (Fradkin, Littman, Song).

Learning from GPS data

Global Positioning System (GPS) technology is becoming increasingly important in many applications. With the proliferation of hand-held devices, car navigation systems, low-cost GPS sensors and mobile computing, it will soon be ubiquitous. Applications such as intelligent early-reminders and collaborative schedulers may soon be available in Personal Digital Assistants (PDAs).

Perhaps the major application of GPS technology is real-time route planning. Although existing digital maps provide time estimates of trips along a given route, they are typically based on static estimates of route travel time and thus ignore other significant factors, such as traffic patterns that vary with the day of the week or the time of the day. Location data collected continually by a user's GPS device can be the basis of more informed travel-time predictions and route suggestions. Good route planning that encompasses the map, day, time and other traffic information can be formulated as a complex semi-Markov decision process, a direction we are currently pursuing. (Bhagat, Littman Sakkis).

Probabilistic Planning

Traditional planning has followed the deterministic philosophy of "The expected outcome will always be the actual outcome". PDDL, the Planning Domain Definition Language, is a useful tool for expressing such deterministic worlds. Unfortunately, the real world doesn't work in such a clean way. Consider, for example, the simple task of driving a car from point A to point B. The intention is always to get the car from point A to point B. However, external factors such as accidents, flat tires, engine failures, etc. are sometimes present and so the desired goal, namely moving from point A to point B, may not be achieved. As another example, more familiar in recent news, consider the Mars Rover. The rover may execute an action "pick-up-rock", intending to pick up a rock on the ground in front of it. However, there is a chance that the rock will slip out of the rover's hand and thus the desired effect, namely holding the rock, is not achieved. PPDDL, the Probabilistic Planning Domain Definition Language, aims at incorporating such external factors into the planning world. The language achieves this incorporation through the use of probabilistic clauses. As an example, consider the action of driving a car from point A to point B. A schema for this action might go something like:

  • with probability .9 the car moves from A to B
  • with probability .05 the car gets into an accident
  • with probability .05 the car gets a flat tire
Such domains require different planning approaches than deterministic ones, because planners have to take into consideration that their expected outcome may not be the actual outcome. PPDDL represents a step toward moving the planning world from deterministic to probabilistic domains, motivated by the fact that the real world is filled with uncertainty. Currently, PPDDL is the domain description language being used in the Probabilistic Track of the 2004 International Planning Competition. (Asmuth, Batchis, Littman, Weissmann, Younes).