|
[
2004,
2005,
2006,
2007]
|
News and Events, 2007 |
|
|
Past Events:
-
7/26/2007:
Michael won the best
short video award in the first "AI video"
competition held at AAAI07. He brought us back a
golden trophy. The award winning video can be viewed
here.
-
7/16/2007:
Lihong passed his
qualifying exam successfully. Congratulations to
him. He presented his work on efficient exploration
in model-free reinforcement learning.[read more]
His talk's abstract:
Is it the right time to decide a thesis
topic, or is it better to read more in the
literature? In artificial intelligence,
such situations are captured and termed as
the "exploration-exploitation dilemma". A
rational agent (e.g., a graduate student)
tries to maximize utility by "exploiting"
his knowledge about the world (e.g.,
deciding an suitable and interesting thesis
topic), but this knowledge has to be
acquired by the agent himself through
"exploring" the world (e.g., reading
existing work and taking courses). Knowledge
comes at costs, and a tradeoff between
exploration and exploitation is critical.
This talk is about balancing exploration and
exploitation in the context of reinforcement
learning, a subfield of artificial
intelligence. We will review a few ad hoc
exploration strategies, and then introduce
the recently developed PAC-MDP methods that
are provably efficient. Our focus is on
model-free PAC-MDP algorithms, which do not
use an estimated world model to explore, and
the intuitions behind them. These methods
have the benefits of requiring minimal
computation and are popular in practice.
- 5/30/2007:
Alex was the first to graduate
from the lab. He contributed to the lab's research more than
anyone else and defended his solid thesis. We'll miss him
[read more]
His talk's abstract:
Reinforcement Learning (RL)
is a powerful paradigm within the Artificial
Intelligence and Machine Learning communities.
The general problem is how to enable an agent
(computer program, robot, etc¿) to maximize an
external reward signal by acting in an unknown
environment. In this thesis, we study the
problem as mathematically modeled by a finite
state and finite action Markov Decision
Processes (MDPs). In particular, our focus is on
the problem of exploration: how does an agent
determine whether to act to gain new information
(explore) or to act consistently with past
experience to maximize reward (exploit). We
study both fundamental types of RL algorithms:
those that explicitly model the dynamics of
their environment (model-based) and those that
learn only a value or utility function over the
states of the environment (model-free). We
develop and analyze algorithms of both types
that are Probably Approximately Correct for MDPs
(PAC-MDP), meaning that, with arbitrarily high
probability, they (provably) act near-optimally
almost all of the time. We also develop lower
bounds and provide a general framework that
applies to most of the results in this thesis
and to other results that generalize the finite
MDP assumption.
- Spring 2007: Michael,
with the help of Enrique, held a light seminar on "Multi-agent Reinforcement Learning".
More information at
here.
- Spring 2007: We had
a reading group throughout the
semester (Alex was the organizer). The main topic
was
computational learning theory. Please visit
here for schedules.
- 4/23/2007: Fancong Zen
successfully defended his PhD defense. He talked about
"Just-in-time and Just-in-place Deadlock
Resolution". Congratulations Fancong!
- 4/5/2007:
Bethany successfully passed
her qual exam. Congratulations to her! Her research topic
was about efficient exploration for Mobile robots.
[read more]
Her talk's abstract:
Reinforcement learning is a
branch of machine learning in which an agent
interacts with its environment and determines
the best actions to take based on its
experience. The problem differs from supervised
learning in which the agent receives precise
feedback concerning what action it should take.
The minimal-feedback requirements make
reinforcement learning a good tool for robotic
agents in real-life environments since
supervising training data is hard to come by in
these environments but a task description is
often a good deal easier to create. However,
real-world environments often entail a large
state space and reinforcement learning
approaches have difficulties of its own when
learning in large domains. One of the key
problems is the determination of when to explore
the environment to learn more and when to
exploit the knowledge that has already been
obtained. The goal of my research is to find
ways to balance exploration and exploitation by
utilizing commonalities between states in the
environment and to validate these methods on
running robots.
- Spring 2007: Alex received one of the two
Graduate School Research Awards for 2006-2007,
congratulations Alex!
|
|
News and Events, 2006 |
- 12/9/2006:
Ali, Lihong, Tom and Michael won the first
RL competition held this year at NIPS. Their
agent came out first in Pentathalon event, as well
as Puddleworld.
- 11/14/2006: Carlos W. Diuk
successfully passed his qual exam. Congratulations
to him! He presented his work in Model-based
hierarchical Reinforcement Learning.
[read more]
His talk's abstract:
Model-based learning, hierarchies and state abstraction are
well-studied techniques for improving the learning efficiency of
reinforcement-learning algorithms in large state spaces.
In this talk I introduce these techniques and present a new algorithm
which brings the three ideas together. This algorithm tackles two open
problems in reinforcement learning, and provides a solution to
deterministic cases. First, I show how models can improve the learning
speed in a well-known hierarchy-based framework without disrupting
opportunities for state abstraction. Second, I show how hierarchies
can augment existing representations to achieve provably low
(polynomial) computational complexity.
I will finally talk about remaining open problems and future
directions in my research.
Examination Committee: Michael Littman(Chair), Ken Shan, Alex Borgida, Charles Gallistel and Ulrich Kremer
- 11/1/2006: Alex and
Lihong received the $500 award from Google for the
best student poster in the first Machine Learning
conference held by the New York academy of Sciences.
They presented their work on
this paper.
- Fall 2006: Alex and
Michael are co-organizing a light seminar "Learning
Theory for Sequential Decision Making" this semester
[more info]
- Fall 2006: "Introduction
to Control Theory" course is offered this semester.
- 8/1/2006: Rati successfully defended her
master's thesis. Congratulations to her!
[read more]
Her thesis abstract:
This thesis evaluates a
reinforcement-learning approach to a real-life learning
problem, that of automated stock trading. Automated
trading agents are becoming increasingly popular as
financial markets move towards electronic trading.
Completely electronic stock exchanges like INET
Electronic Communication Network provide all high volume
electronic transactions for public use. We develop
trading strategies based on several existing
reinforcement-learning algorithms and evaluate their
performance. The automated agents make trading decisions
in a stock-trading JAVA simulator supplied with actual
limit order trades from INET for two well-known
technology stocks over about three weeks of trading.
We analyze five trading strategies: one learned by a
straightforward Q-learning implementation; one learned
by Rmax, a model-based method that works on the
principle of "optimism under uncertainty"; one learned
by factored Rmax, a model-based method with prior
knowledge in the form of variable independence; a null
strategy that made no trades; and a hand-tuned trading
strategy. We compared to a representative hand-tuned
strategy because such strategies are often developed by
traders but they prove deficient in being able to
evaluate the salient features of the current order
books. We formulate the problem as an MDP in which the
transition probabilities are represented by a Dynamic
Bayesian Network (DBN). The results conclude that the
model-based factored Rmax approach, which exploits the
inherent structure in the problem, is most effective at
learning to make profitable decisions.
|
|
News and Events, 2005 |
|
|
Past Events:
- Fall 2005: Michael
taught Learning and Sequential Decision Making
course this semester.
[read more]
Description:
Through a combination of classic papers and
more recent work, the course will explore automated decision
making from a computer-science perspective. It will examine
efficient algorithms, where they exist, for single agent and
multiagent planning as well as approaches to learning
near-optimal decisions from experience. Topics will include
Markov decision processes, stochastic and repeated games,
partially observable Markov decision processes, and
reinforcement learning.
-
8/29/2005:
Nick Jong visited our lab. He's a 4th
year PhD student at University of Texas at Austin and is working on
Transfer Learning. He presented his paper "State Abstraction
Discovery from Irrelevant state variables" which was published
in IJCAI05 this year.
-
Summer 2005: We had a reinforcement reading
group which was held every Thursday during the summer; we discussed recent
advances in the RL hierarchical Learning community.
[read more]
-
7/11/2005: Lihong presented the paper "Lazy
approximation for solving continuous finite-horizon MDPs" in
AAAI-05, Pittsburgh.
[read more]
-
5/7/2005:
Elliot Ludvig visited our lab. He gave a talk
about different aspects of RL.
[read more]
Abstract: For
most animals, rewarding stimuli exert multiple influences on
behaviour. Rewards can selectively enhance actions (operant
conditioning), change the value and salience of neutral
stimuli (classical conditioning), and alter immediate
motivational and affective states. On interval schedules of
reinforcement, rewards show periodicity, and animals will
generally time their responses to coincide with food
availability. The first part of this talk presents results
from a series of empirical studies with rats and pigeons
that elucidate the mechanisms through which animals respond
to dynamically-changing sequences of intervals. The latter
portion explores how magnitude of reinforcement changes
timing, drawing on results from a study using Brain
Stimulation Reward (BSR) in rats. Both reward magnitude and
interval duration produce their effects on timed responses
through a combination of unlearned, immediate after-effects
and a learned expectation of upcoming rewards. I conclude
with the suggestion that reinforcement learning algorithms
may benefit from the incorporation of both these aspects of
rewards.
-
4/29/2005:
People involved in "intrinsically
motivated learning" project came down to Rutgers from University of
Alberta, University of Michigan and University of
Massachusetts Amherst. We had a full day of talks,
discussions and demos.
[read more]
The agenda for the meeting was:
[9:00:] Welcome & Introduction: Michael, Andy, Satinder,
Rich
[9:10:] Rich: some as yet unspecified words of wisdom ...
[9:45:] Vishal and Satinder: Intrnisically Motivated AIBO
experiments
[10:15:] Ozgur and Andy: Algorithms for Intrinsic Motivation
[10:45:] General Discussion
[11:30:] Ali and Michael lead discussion of Exploration and
Partial
Observability
[12:00-1:00:] Ali and Michael's discussion continues through
lunch (they don't get to eat :-))
[1:00:] Alex: Interval Estimation and Exploration
[1:30:] Carlos: Exploration issues
[2:00:] Lihong and Tom: Issues for Continuous State Spaces
[2:30:] Bethany: Latent Model Learning
[3:00:] General Discussion: perhaps focus on the Transfer
Proposal
[4:30:] Demos (Tom button pusher, Vishal's ball retriever
based on options, Bethany's hill climber robot and Ali's
door passer) .
- 3/25/2005:
In celebration of the
new enhanced RL3 lab, we hosted an open house party.
-
2/7/2005: Prof. Marie desJardins was the guest of RL3 and gave a
talk about Annotating Clustering Constraints with Feature
Relevance Information.
[read more]
Abstract: Constrained clustering uses membership
constraints between pairs of data points to improve the
performance of clustering algorithms [2]. Previous work in this area has focused on two classes of
binary constraints: MUST-LINK constraints (which indicate that two data points
should be placed in the same cluster) and CANNOT-LINK
constraints (which indicate that two data points should be
placed in different clusters). One recent constrained
clustering algorithm, MPCK-MEANS [2], integrates such
constraints with a metric learning approach, yielding very
good performance in a variety of domains. In this talk, I
will describe our ongoing research to extend MPCK-MEANS by
annotating the constraints with information about feature
relevance. Specifically, each constraint may include a
feature vector, indicating the degree to which a user (or
oracle) believes that a particular feature is important for
generating the MUST-LINK or CANNOT-LINK constraint that is
associated with that pair of data points. I will present a
method for automatically generating feature annotations
(simulating a domain expert), and will describe our initial
experimental results, which show that feature annotations
can improve clustering performance for a given number of
constraints. [1] Mikhail Bilenko, Sugato Basu, and Raymond
J. Mooney, "Integrating constraints and metric learning in
semi-supervised clusetring." In Proceedings of the 21st International Conference on
Machine Learning (ICML-2004), pp. 81-88, Banff, Canada, July
2004. [2] Kiri Wagstaff, "Intelligent Clustering with
Instance-Level Constraints." Cornell University Computer
Science Ph.D. dissertation, 2002. Bio:Dr. Marie desJardins
is an assistant professor in the Department of Computer
Science and Electrical Engineering at the University of
Maryland, Baltimore County. Prior to joining the faculty in 2001, Dr. desJardins was a
senior computer scientist at SRI International in Menlo
Park, California. Her research is in artificial
intelligence, focusing on the areas of machine learning,
multi-agent systems, planning, interactive AI techniques,
information management, reasoning with uncertainty, and
decision theory.
|
|
News and Events, 2004 |
|
Past Events:
- 12/13/2004:
RL3 hosted Prof. Lisa Meeden for giving the
talk titled: "Creating Intrinsic Value Systems
for doing Reinforcement Learning in
Developmental Robotics".
[read more]
Abstract: Developmental robotics is a move away from task-specific design where a robot is programmed to accomplish a particular pre-defined goal and instead explores the kinds of capabilities that a robot can discover through self-motivated actions based on its own body and the dynamic structure of its environment. I will review the ways in which intrinsic value systems have been used to implement reinforcement learning so as to induce self-motivated actions. Then I will describe our own approach that is based on the competition between two innate pressures within the developing robot:
the need to accurately predict the environment while simultaneously trying to seek out novelty in the environment.
- 9/14/2004:
A new
robotics conference has been announced.
Michael is on the program committee.
- 8/16/2004:
Alex and Michael got
a paper on MBIE accepted into
ICTAI. Way to go Alex!
- 6/26/2004:
Dave presents at the International Workshop on
Learning Classifier Systems. Another first for
the lab!
- 5/18/2004:
Michael's keynote at
The Seventeenth Canadian Conference on
Artificial Intelligence.
- 3/26/2004:
We will be presenting some work at the computer
science department's
open house.
- 3/11/2004:
Our first lab paper was accepted to AAAI-04, "An
Instance-based State Representation for Network
Repair" (Littman, Ravi, Fenson, Howard)!
- 1/1/2004:
Michael is teaching
Learning and Sequential Decision Making.
|
|