CS 598: Learning and Sequential Decision Making
Rutgers University
Spring 2009
Michael L. Littman
Time: Thursday 1:40-4:40
Place: Rutgers, Hill 120
Semester: Spring 2009
Michael's office hours: Hill 409, Thu 1:00 and by appointment
(mlittman@cs.rutgers.edu).
TA: Shu Chen's office hours: Hill 416, Thu 12:00pm-1:00pm
(shuchen@cs.rutgers.edu)
Description: Through a combination of classic papers and more
recent work, the course explores automated decision making from a
computer-science perspective. It examines efficient algorithms, where
they exist, for single agent and multiagent planning as well as
approaches to learning near-optimal decisions from experience. Topics
will include Markov decision processes, stochastic and repeated games,
partially observable Markov decision processes, and reinforcement
learning. Of particular interest will be issues of generalization,
exploration, and representation. Each student will be expected to
present a published research paper and will participate in a group
project to create a reinforcement-learning system for this year's
international reinforcement-learning competition. Participants should
have taken a graduate-level computer science course and should have some exposure to reinforcement
learning from a previous computer-science class or seminar; check with
instructor if not sure.
Calendar
- 1/22: I'm so sorry, but I planned wrong and accepted an
invitation to speak out of the country today. I leave the
introduction in the able hands of Carlos Diuk. Please read Chapters 1
and 2 of
Littman (1996).
- 1/29: We continued with value iteration and covered linear
programming in MDPs.
- 2/5: We covered policy iteration and TD. Please read
Sutton (1988) and
Littman and Szepesvári (1996).
- 2/12:
Singh and Sutton (1996),
Wiewiora (2003),
Ng, Harada, Russell (1999).
- 2/19:
Fong (1995),
Koenig and Simmons (1993).
- 2/26:
Kearns and Singh (1998),
Littman and Stone (2004).
Optional:
Weber (1996).
- 3/5:
Littman and Stone (2004),
Gordon (1995).
Optional:
Baird (1995),
Sutton (1996),
Boyan and Moore (1995),
Tesauro (1995).
- 3/12:
Greenwald and Hall (2003),
Cassandra, Kaelbling, Littman (1994).
Optional:
Cassandra, Littman, Zhang (1997).
- 3/19: Spring break.
- 3/26:
Baird and Moore (1998),
Baxter and Bartlett (1999),
Smart and Kaelbling (2000).
- 4/2: Guest lecture.
Lagoudakis and Parr (2003),
Kocsis and Szepesvári (2006).
- 4/9:
Poupart, Vlassis, Hoey, and Regan {2006),
Ng, Kim, Jordan and Sastry (2003).
- 4/16: Interim project reports.
Chrisman (1992).
- 4/23:
Loch and Singh (1998),
Littman, Sutton and Singh (2002),
Li, Littman, and Walsh (2008).
- 4/30: Final project reports.
Dayan and Daw (2008).
- 5/7: Final.
Papers
Sutton (1990)
Kocsis and Szepesvári (2006),
Silver, Sutton, and Mueller (2008).
Optional:
Chaslot, Winands, Herik, Uiterwijk, and Bouzy (2008)
Topics and Papers
The RL survey referred to below is
Kaelbling, Littman, Moore (1996).
- Markov decision processes and algorithms.
Survey, Sections 1 and 3.
Littman, Dean, Kaelbling (1995).
- TD-lambda.
Survey, Section 4.1.
- Q-learning/Convergence.
Survey, remainder of Section 4, Section 5.
- Exploration.
Survey, Section 2.
- Repeated Games.
Hart and Mas-Colell (2000).
Greenwald and Jafari (2003).
- Generalization and convergence.
Survey, Sections 6.1, 6.2.
Baird (1995).
Gordon (1995).
- Partially observable environments.
Survey, Section 7.
- RL in POMDPs.
Chrisman (1992).
Loch and Singh (1998).
- Hierarchy.
Survey, remainder of Section 6.
Dietterich (1998).
- Policy search.
- Non-stationary environments.
- Instance-based RL.
Ormoneit and Sen (1999).
- Applications.
Survey, Sections 8 and 9.
Crites and Barto (1996),
Tesauro (1992).
RL Links
The URL for this page is
http://www.cs.rutgers.edu/~mlittman/courses/seq09/.