CS 672: Learning and Sequential Decision Making
Rutgers University
Spring 2004
Michael L. Littman
Time: Monday, Wednesday 4:30-5:50
Place: Rutgers, Hill Center 482/484
Semester: Spring 2004
Michael's office hours: Hill 409, Wednesday 2pm-3pm and by
appointment (mlittman@cs.rutgers.edu).
Description: Through a combination of classic papers and more
recent work, the course will explore automated decision making from a
computer-science perspective. It will examine efficient algorithms,
where they exist, for single agent and multiagent planning as well as
approaches to learning near-optimal decisions from experience. Topics
will include Markov decision processes, stochastic and repeated games,
partially observable Markov decision processes, and reinforcement
learning. Of particular interest will be issues of generalization,
exploration, and representation. Each student will be expected to
present a published research paper and will participate in a group
project to create a reinforcement-learning agent to compete in a video
game environment. Participants should have taken a graduate-level
computer science course and should have some exposure to reinforcement
learning from a previous computer-science class or seminar; check with
instructor if not sure. This is the first time the course is being
offered.
News (most recent first)
- 4/26/04: RARS example tracks due Wednesday.
- 4/20/04: RARS group presentations scheduled for May 2nd. Final
is scheduled for May 12th.
- 3/29/04: No class 4/5/04, meet in RARS groups. Guest lecture
4/12 (Russ Greiner).
- 3/01/04: We're teaming up with the algorithms class to cover
Littman and Stone (2004) and
Hart and Mas-Colell (2000). We probably won't have time for the
second paper, though.
- 2/25/04: Here's a pointer to that
RARS report from Berkeley. Oh, here's
another one.
- 2/9/04: We're talking about video games today.
- 2/8/04: We're now meeting in Hill.
- 1/21/04: First class meets today.
Topics and Papers
Throughout the semester we will be reading sections of the RL survey
by
Kaelbling, Littman, Moore (1996).
- Markov decision processes and algorithms.
Survey, Sections 1 and 3.
Littman, Dean, Kaelbling (1995).
- TD-lambda.
Survey, Section 4.1.
Sutton (1988).
- Q-learning/Convergence.
Survey, remainder of Section 4, Section 5.
Littman and Szepesvári (1996).
- Eligibility traces.
Singh and Sutton (1996).
- Shaping.
Ng, Harada, Russell (1999).
- Exploration.
Survey, Section 2.
Fong (1995).
Kearns and Singh (1998).
- Repeated Games.
Littman and Stone (2004).
Hart and Mas-Colell (2000).
Greenwald and Jafari (2003).
- Generalization and convergence.
Survey, Sections 6.1, 6.2.
Baird (1995).
Gordon (1995).
- Partially observable environments.
Survey, Section 7.
Cassandra, Kaelbling, Littman (1994).
Cassandra, Littman, Zhang (1997).
- RL in POMDPs.
Chrisman (1992).
Loch and Singh (1998).
- Hierarchy.
Survey, remainder of Section 6.
Dietterich (1998).
- Policy search.
Baxter and Bartlett (1999) .
- Non-stationary environments.
Sutton (1990).
- Instance-based RL.
Ormoneit and Sen (1999).
- Applications.
Survey, Sections 8 and 9.
Crites and Barto (1996),
Tesauro (1992).
RL Links
The URL for this page is http://www.cs.rutgers.edu/~mlittman/courses/rl04/.