CS 671: Learning and Sequential Decision Making
Rutgers University
Fall 2005
Michael L. Littman
Time: Tuesday, Thursday 1:40-3:00
Place: Rutgers, Hill 482
Semester: Fall 2005
Michael's office hours: Hill 409, by appointment
(mlittman@cs.rutgers.edu).
Description: Through a combination of classic papers and more
recent work, the course explores automated decision making from a
computer-science perspective. It examines efficient algorithms, where
they exist, for single agent and multiagent planning as well as
approaches to learning near-optimal decisions from experience. Topics
will include Markov decision processes, stochastic and repeated games,
partially observable Markov decision processes, and reinforcement
learning. Of particular interest will be issues of generalization,
exploration, and representation. Each student will be expected to
present a published research paper and will participate in a group
project to create a reinforcement-learning system for a benchmark set
of problems. Participants should have taken a graduate-level
computer science course and should have some exposure to reinforcement
learning from a previous computer-science class or seminar; check with
instructor if not sure.
News (most recent first)
- 11/17/05: Revised plan for remaining classes:
11/17:
Singh, James and Rudary (2004)
11/22: no class (Wed. schedule)
11/29:
Ng, Kim, Jordan and Sastry (2003)
12/1: Learning results due,
Baird and Moore (1998)
12/6: no class---workshop
12/13:
Sutton (1990)
- 11/13/05: Plan for remaining classes:
11/15:
Littman, Sutton and Singh (2002)
11/17:
Singh, James and Rudary (2004)
11/22:
Baird and Moore (1998)
11/29:
Ng, Kim, Jordan and Sastry (2003)
12/1: Learning results due,
Baxter and Bartlett (1999)
12/6: catch up, workshop finalization
12/13:
Sutton (1990)
- 11/2/05: For tomorrow:
Cassandra, Kaelbling, Littman (1994) and
Cassandra, Littman, Zhang (1997).
- 10/26/05: For tomorrow:
Greenwald and Hall (2003).
- 10/19/05: For tomorrow:
Littman and Stone (2004).
- 10/7/05: This time (somewhat late announcement!) we'll do
Koenig and Simmons (1993). Next time, we'll cover
Kearns and Singh
(1998).
- 9/27/05: For 9/28,
Tesauro (1995),
Boyan and Moore (1995), and
Sutton (1996). For 9/30 (Tom Walsh),
Baird (1995),
and
Gordon (1995). For
10/4 (Alex Strehl)
Fong (1995).
- 9/20/05: On 9/22,
Ng, Harada, Russell (1999) and
Wiewiora (2003).
- 9/19/05: Exercises due 9/20.
Tomorrow, I might also cover
Ng, Harada, Russell (1999).
- 9/15/05: Exercises due 9/20.
For next time, read
Singh and Sutton (1996).
- 9/13/05: Exercises due 9/20.
For next time, read
Littman and Szepesvári (1996).
- 9/8/05: Exercises assigned. For next time, read
Sutton (1988).
- 9/3/05: Class is moved to Hill 482 (except 9/13, where we return
to ARC 108).
- 9/1/05: First class meets today. Looking for alternate
classroom. Read Chapters 1 and 2 (pages are 21--69 out of 283) of
Littman
(1996).
Topics and Papers
The RL survey referred to below is
Kaelbling, Littman, Moore (1996).
- Markov decision processes and algorithms.
Survey, Sections 1 and 3.
Littman, Dean, Kaelbling (1995).
- TD-lambda.
Survey, Section 4.1.
- Q-learning/Convergence.
Survey, remainder of Section 4, Section 5.
- Want to cover Ng's
helicopter work.
- Exploration.
Survey, Section 2.
- Repeated Games.
Hart and Mas-Colell (2000).
Greenwald and Jafari (2003).
- Generalization and convergence.
Survey, Sections 6.1, 6.2.
Baird (1995).
Gordon (1995).
- Partially observable environments.
Survey, Section 7.
- RL in POMDPs.
Chrisman (1992).
Loch and Singh (1998).
- Hierarchy.
Survey, remainder of Section 6.
Dietterich (1998).
- Policy search.
- Non-stationary environments.
- Instance-based RL.
Ormoneit and Sen (1999).
- Applications.
Survey, Sections 8 and 9.
Crites and Barto (1996),
Tesauro (1992).
RL Links
The URL for this page is
http://www.cs.rutgers.edu/~mlittman/courses/rl05/.