|
|
|
|
Upcoming Talks: |
|
|
|
Past Talks: |
-
Date:
12/6/2007 Title:On a network creation game
Speaker:
Yishay Mansour , Google & Tel Aviv University
Abstract: A network creation game abstracts a network construction by selfish agent who build a network by buying links to each other. Each player pays a fixed cost per link, and suffers an additional communication cost (which is the sum of distances to the it is interested in communicating with players). The uniform model (where a player is interested to the distances to all other players) was first proposed in Fabricant et al 2003, and we will also consider a non-uniform version of the game.
The talk will focus on understanding the equilibrium structure in a network creation games. First, we will show that a pure Nash Equilibrium exists (and for the uniform model, even a strong equilibrium exists). Our main focus would be on the quality of the resulting Nash equilibrium, as modeled by the Price of Anarchy. We will bound worse case ratio of the cost of some Nash equilibrium to that of the social optimum.
-
Date:
11/16/2007 Title: Linear Prioritized Sweeping
Speaker:
Alborz Geramifard, Alberta University Abstract: We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending Sutton's Dyna architecture to linear function approximation. Dyna-style planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. we also introduce the first version of linear prioritized sweeping (LPS). In the control setting, we found empirically on the Mountain Car and Boyan Chain problems that LPS performed better than Sarsa.
-
Date:
11/15/2007 Title: Continuous-State POMDPs with Hybrid Dynamics
Speaker:
Emma Brunskill, MIT Abstract: Continuous-state POMDPs
provide a natural representation for a variety of tasks, including many in robotics. However, existing continuous-state POMDP approaches
are limited by their reliance on a single linear model to represent the world dynamics. I will introduce a new switching-state (hybrid)
dynamics model that can represent multi-modal state-dependent dynamics. The talk will present a new point-based POMDP planning algorithm
for solving continuous-state POMDPs using this dynamics model, and also provide a constrained optimization approach for approximating the
value function as a mixture of a bounded number of Gaussians. I will present results on two example simulation problems and demonstrate
that when different degrees of state accuracy are needed to accomplish a task, our hybrid continuous-state approach outperforms a standard
discrete state technique. I will also briefly mention my current work on continuous state problems.
This is joint work with Nicholas Roy, Leslie Kaelbling and Tomas Lozano-Perez.
-
Date:
7/6/2007 Title: A Direct, Bayesian
Approach to Continuous Global Function Optimization
Speaker:
Kevin Seppi,
Brigham Young University Abstract: While
continuous global optimization is often successfully tackled by such popular
approaches as simulated annealing, genetic algorithms, and particle swarm
optimization, these methodologies make sampling decisions based mostly on
heuristics and intuition. Therefore, the very way in which these algorithms are
designed makes it difficult to analyze them, to improve them, or to incorporate
additional information about the target function.
The focus of this presentation is a Bayesian and decision-theoretic model of
global optimization that is at once easy to understand, possible to rigorously
analyze, reasonable to use, and infuriatingly consistent with No Free Lunch
Theorems for Optimization.
-
Date:
7/30/2007 Title: Two Methods for Learning
Manifold Controllers Speaker:
Tom Erez, Washington University of
St. Louis Abstract: The curse of
dimensionality plagues the application of reinforcement learning (RL) in
continuous, nonlinear domains (such as robot walking): in problems with more
than 10 dimensions, it is almost impossible to approximate the optimal policy
over the entire volume of state space. In certain tasks, it is possible to
represent the policy only along certain lower-dimensional manifolds embedded in
state space, a technique we call Manifold Control. In this talk, I will present
its usefulness in high-dimensional periodic tasks such as robot walking and
swimming, and discuss two different methods for learning manifold control from a
scalar reward signal. As opposed to classic RL methods (such as Q-learning), the
computational complexity of both methods is polynomial in the domain's
dimensionality, rather than exponential.
The demos I discuss are: A simulation of a multilink swimmer (of up to 34 state
dimensions and 15 action dimensions) that learned to bring its nose to a moving
goal position:
[watch] Two simulations of a 10D kneed walker walking on stilts and on
uneven terrain:
[watch]
-
Date:
7/6/2007 Title: A Direct, Bayesian
Approach to Continuous Global Function Optimization
Speaker:
Kevin Seppi,
Brigham Young University Abstract: While
continuous global optimization is often successfully tackled by such popular
approaches as simulated annealing, genetic algorithms, and particle swarm
optimization, these methodologies make sampling decisions based mostly on
heuristics and intuition. Therefore, the very way in which these algorithms are
designed makes it difficult to analyze them, to improve them, or to incorporate
additional information about the target function.
The focus of this presentation is a Bayesian and decision-theoretic model of
global optimization that is at once easy to understand, possible to rigorously
analyze, reasonable to use, and infuriatingly consistent with No Free Lunch
Theorems for Optimization.
-
Date:
4/13/2007 Title: Agnostic Active Learning
Speaker: John
Langford, Yahoo! Research Abstract:
Given only an assumption of IID samples, standard
batch learning requires O(1/e^2) labeled samples to
find an e-optimal hypothesis within a set of
hypotheses. The fundamental promise of active
learning is that with only O(log(1/e)) carefully
chosen labeled samples, we can make the same
guarantee. Several examples have been worked out
showing that for specific (learning problem,
hypothesis space) pairs, this dependence is
possible. One frustrating aspect of active learning,
is that it tended to require noise-free (or at least
effective noise free) learning problems. I'll
present an algorithm A^2 (Agnostic Active) which
removes this constraint. Under an assumption of only
IID samples, A^2 has the following guarantees: - For every (learning problem, hypothesis space)
pair, the returned hypothesis is e-optimal with high
probability. - For every (learning problem, hypothesis space)
pair, the number of labeled samples required is no
worse than batch learning. - For known pairs where O(log(1/e)) labeled examples
are required without noise, we also achieve an
O(log(1/e)) dependence.
-
Date:
2/28/2007 Title: Modeling Science: Topic models of Scientific Journals and Other Large Text Databases
Speaker: David Blei, Princeton University
Abstract: A surge of recent research in machine learning and statistics has developed new techniques for finding patterns of words in document collections using hierarchical probabilistic models. These models are called "topic models" because the word patterns often reflect the underlying topics that are combined to form the documents; however topic models also naturally apply to such data as images and biological sequences.
After reviewing the basics of topic modeling, I will describe two related lines of research in this field, which extend the current state of the art.
First, while previous topic models have assumed that the corpus is static, many document collections actually change over time: scientific articles, emails, and search queries reflect evolving content, and it is important to model the corresponding evolution of the underlying topics. For example, an article about biology in 1885 will exhibit significantly different word frequencies than one in 2005. I will describe probabilistic models designed to capture the dynamics of topics as they evolve over time.
Second, previous models have assumed that the occurrence of the different latent topics are independent. In many document collections, the presence of a topic may be correlated with the presence of another. For example, a document about sports is more likely to also be about health than international finance. I will describe a probabilistic topic model which can capture such correlations between the hidden topics.
In addition to giving quantitative, predictive models of a corpus, topic models provide a qualitative window into the structure of a large document collection. This perspective allows a user to explore a corpus in a topic-guided fashion. We demonstrate the capabilities of these new models on the archives of the journal Science, founded in 1880 by Thomas Edison. Our models are built on the noisy text from JSTOR, an online scholarly journal archive, resulting from an optical character recognition engine run over the original bound journals.
-
Date:
2/13/2007 Title: When success is central in stochastic models
Speaker: Judy Goldsmith, University of Kentucky
Abstract: I will discuss an ongoing research project to build decision-support software for advising,
and how anthropologists and welfare case managers helped us create a new knowledge representation
formalism. I will give brief introductions to the US "welfare-to-work" system; Markov decision
processes with action results; "bowtie fragments" of dynamic Bayesian networks; a knowledge
elicitation method for bowtie fragments, and the social interactions that led to bowtie fragments.
I will tie all of this up with a brief discussion of the Social Construction of Technology theory.
-
Date:
4/5/2006 Title: Utility based Data Mining
Speaker:
Gary Weiss
Abstract: Early work in
predictive data mining did not address the complex
circumstances in which models are built and applied.
It was assumed that a fixed amount
of training data were available-- at no cost-- and
only simple objectives, namely predictive accuracy,
were considered. Over time it became clear that
these assumptions were unrealistic and that the
economic utility of acquiring training data,
building a model, and applying the model had to be
considered. This led to work on active learning and
cost-sensitive learning. However, most of this work
only factored in utility considerations in one stage
of the data mining process (i.e., data acquisition,
model building, or applying the model). Recently,
several colleagues and I proposed the term
utility-based data mining (UBDM) to encompass all
work that considers utility during the data mining
process. The purpose of introducing this new term is
to encourage researchers and practitioners to focus
their attention on maximizing utility throughout the
entire data mining process. In this talk I will give
a brief overview of utility-based data mining,
discuss why it is important, and review how existing
research fits into this framework. I will also
discuss what UBDM has to offer to reinforcement
learning and vice-versa. For those who are interested, papers from the First
International Workshop on Utility-Based Data Mining
are available on-line.
-
Date:
3/29/2006 Title: Learning and Planning in POMDPs
Speaker:
Eyal Even-Dar
Abstract:
We consider the most realistic reinforcement learning setting in which an agent starts in an unknown environment (the POMDP) and must follow one continuous and uninterrupted chain of experience with no access to ``resets'' or ``offline'' simulation. We provide algorithms for general POMDPs that obtain near optimal average reward. One algorithm we present has a convergence rate which depends exponentially on a certain horizon time of an optimal policy, but has no dependence on the number of (unobservable) states. The main building block of our algorithms is an implementation of an approximate reset strategy, which we show always exists in every POMDP.
We also relate POMDPs to multiplicity automata -- showing that POMDPs can be represented by multiplicity automata with no increase in the representation size. Furthermore, we show that the size of the multiplicity automaton is equal to the rank of the predictive state representation. Therefore, we relate both the predictive state representation and POMDPs to the well-founded multiplicity automata literature.Based on the multiplicity automata representation, we provide a planning algorithm which is exponential only in the multiplicity automata rank rather than the number of states of the POMDP.
-
Date:
3/22/2006 Title: Privacy Unmasked: the
Separation of Information from Effect Speaker:
Scott Stornetta (Equipoise) Abstract:
Privacy issues have a large
overlap with work in computer science, from security
to social networking, spam to government requests
for search engine information to a consumer's
purchase history at Amazon. At the same time, there
is often a sense that protecting one's privacy is a
gradually losing battle, as information about
individuals becomes more and more widespread.
We suggest a different framework for the privacy
question, which separates the information at stake
from its unwanted effect. Viewed in this light, one
can recast many of today's privacy problems in a way
that suggests win-win solutions. We'll illustrate
this theory with specific examples for new
initiatives in networking, software, and online
commerce.
-
Date:
11/16/2005 Title: Programmable Reinforcement Learning
Agents Speaker:
David Andre
Abstract:
In this talk I will
discuss the use of partial programming to create a system for agent design that allows the programmer
to specify what they know, hint at what they suspect
using shaping, and leave unspecified that which they
don't know; the system then optimally completes the
program through experience and takes advantage of
the hierarchical structure of the specified program
to speed learning. Specifically, I will present the
ALisp system, a Lisp-based high-level partial
programming language that allows the programmer to
express his or her prior knowledge in a concise
manner and to constrain the set of policies
considered by the learning subsystem. Online
learning algorithms for ALisp can be proved to
converge to an optimal solution of the SMDP and thus
to an optimal completion of the partial program. I
will also present methods for exploiting the
modularity of ALisp to speed the learning process
using state abstraction. Empirical and theoretical
results on ALisp with state abstraction will be
presented, including the application of ALisp to a
difficult continuous version of the Taxi problem. A
thorny theoretical issue with respect to proving
online convergence with state abstraction will be
also discussed.
-
Date:
11/15/2005 Title: Winning the DARPA Grand Challenge
Speaker:
Sebastian Thrun
Abstract: The DARPA Grand
Challenge has been the most significant challenge to
the robotics community in more than a decade. The
task was to build an autonomous robot capable of
traversing 132 miles of unrehearsed desert terrain
in less than 10 hours. In 2004, the best robot only
made 7.3 miles. In 2005, Stanford won the challenge
and the $2M prize money by successfully traversing
the course in less than 7 hours. This talk,
delivered by the leader of the Stanford Racing Team,
will provide insights into the software architecture
of Stanford's winning robot. The robot massively
relied on machine learning and probabilistic
modeling for sensor interpretation and control. The
speaker will explain some of the basic algorithms
that made this victory possible, and share some of
the excitement characterizing this historic event.
-
Date:
11/2/2005 Title: Probabilistic Policy
Reuse in Reinforcement Learning Speaker:
Fernando Fernandez Rebollo
Abstract:
We contribute
Policy Reuse as a technique to improve a
reinforcement learner with guidance from past
learned similar policies. Our method relies on using
the past policies in a novel way as a probabilistic
bias where the learner faces three choices: the
exploitation of the ongoing learned policy, the
exploration of random unexplored actions, and the
exploitation of past policies. We introduce the
algorithm and its major components: an exploration
strategy to include the new reuse bias, and a
similarity metric to estimate the similarity of past
policies with respect to a new one. We provide
empirical results demonstrating that Policy Reuse
improves the learning performance over different
strategies that learn without reuse. Policy Reuse
further contributes the learning of the structure of
a domain. Interestingly and almost as a side effect,
Policy Reuse identifies classes of similar policies
revealing a basis of ”eigen-policies” of the domain.
In general, Policy Reuse contributes to the overall
goal of lifelong reinforcement learning, as (i) it
incrementally builds a policy library; (ii) it
provides a mechanism to reuse past policies; and
(iii) it learns an abstract domain structure in
terms of eigen-policies of the domain. This is a
joint work with Prof. Manuela Veloso.
-
Date:
5/7/2005 Title: When the past meets the future:
Reaction and expectation on interval schedules of reinforcement
Speaker: Elliot Ludvig
Abstract:
For
most animals, rewarding stimuli exert multiple influences on
behaviour. Rewards can selectively enhance actions (operant
conditioning), change the value and salience of neutral
stimuli (classical conditioning), and alter immediate
motivational and affective states. On interval schedules of
reinforcement, rewards show periodicity, and animals will
generally time their responses to coincide with food
availability. The first part of this talk presents results
from a series of empirical studies with rats and pigeons
that elucidate the mechanisms through which animals respond
to dynamically-changing sequences of intervals. The latter
portion explores how magnitude of reinforcement changes
timing, drawing on results from a study using Brain
Stimulation Reward (BSR) in rats. Both reward magnitude and
interval duration produce their effects on timed responses
through a combination of unlearned, immediate after-effects
and a learned expectation of upcoming rewards. I conclude
with the suggestion that reinforcement learning algorithms
may benefit from the incorporation of both these aspects of
rewards.
-
Date:
2/7/2005
Title: Annotating Clustering Constraints with
Feature Relevance Information
Speaker:
Dr.
Marie desJardins
Abstract:
Constrained clustering
uses membership constraints between pairs of data
points to improve the performance of clustering
algorithms [2]. Previous work in this area has focused on two classes of
binary constraints: MUST-LINK constraints (which indicate that two data points
should be placed in the same cluster) and CANNOT-LINK
constraints (which indicate that two data points should be
placed in different clusters). One recent constrained
clustering algorithm, MPCK-MEANS [2], integrates such
constraints with a metric learning approach, yielding very
good performance in a variety of domains. In this talk, I
will describe our ongoing research to extend MPCK-MEANS by
annotating the constraints with information about feature
relevance. Specifically, each constraint may include a
feature vector, indicating the degree to which a user (or
oracle) believes that a particular feature is important for
generating the MUST-LINK or CANNOT-LINK constraint that is
associated with that pair of data points. I will present a
method for automatically generating feature annotations
(simulating a domain expert), and will describe our initial
experimental results, which show that feature annotations
can improve clustering performance for a given number of
constraints. [1] Mikhail Bilenko, Sugato Basu, and Raymond
J. Mooney, "Integrating constraints and metric learning in
semi-supervised clusetring." In Proceedings of the 21st International Conference on
Machine Learning (ICML-2004), pp. 81-88, Banff, Canada, July
2004. [2] Kiri Wagstaff, "Intelligent Clustering with
Instance-Level Constraints." Cornell University Computer
Science Ph.D. dissertation, 2002. Bio:Dr. Marie desJardins
is an assistant professor in the Department of Computer
Science and Electrical Engineering at the University of
Maryland, Baltimore County. Prior to joining the faculty in 2001, Dr. desJardins was a
senior computer scientist at SRI International in Menlo
Park, California. Her research is in artificial
intelligence, focusing on the areas of machine learning,
multi-agent systems, planning, interactive AI techniques,
information management, reasoning with uncertainty, and
decision theory.
-
Date:
Monday, December 13, 2004
Title: Creating Intrinsic Value Systems for
doing Reinforcement Learning in
Developmental Robotics
Speaker: Dr.
Lisa Meeden
Abstract: Developmental robotics is a
move away from task-specific design where a
robot is programmed to accomplish a particular
pre-defined goal and instead
explores the kinds of capabilities that a robot can
discover through
self-motivated actions based on its own body and the
dynamic structure of
its environment. I will review the ways in which
intrinsic value systems
have been used to implement reinforcement learning
so as to induce
self-motivated actions. Then I will describe our own
approach that is based
on the competition between two innate pressures
within the developing robot:
the need to accurately predict the environment while
simultaneously trying
to seek out novelty in the environment.
This work has been done in collaboration with
Douglas Blank and Deepak
Kumar at Bryn Mawr College, and James Marshall at
Pomona College.
-
Date:
Monday, 11/29/2004
Title: What is Thought?
Speaker:
Eric B. Baum
Abstract: What is thought, and how can it
arise from a material brain,
which in principle could be simulated by an
electronic computer?
How are understanding and consciousness produced by
physical
processes? How exactly do nature and nurture
interact? What is
language, why did it take 4 billion years to evolve,
and how are words
related to meaning? What is meaning? What are will
and awareness?
A model is presented that addresses these questions
in a concrete
and principled way. The computational learning
literature has
explained concept learning as stemming from a
formalized Occam's
razor: a compact program (Occam explanation)
consistent with
enough examples of a concept yields generalization
to
new examples. These results are extrapolated to the
conjecture that
a compact enough program that solves enough
reinforcement learning
trials exploits the underlying structure of its
world and will
generalize to solve new problems. The ramifications
of this
conjecture are discussed and shown to answer the
above questions
in a way that brings to bear results of complexity
theory, is
consistent with data from a variety of fields, and
makes empirical
predictions.
As an illustrative example, results are presented
from experiments
in which the evolution of an economy of programs is
simulated.
Two principles, "conservation of money" and
"property rights"
are imposed on the economy. The economy is seen to
evolve,
starting from random code, to solve a collection of
reinforcement
learning examples. As predicted by the extended
Occam's razor,
the economy then generalizes to solve new problems.
But moreover,
these experiments provide evidence illuminating how
and when a
social system can organize itself to take
appropriate decisions.
This talk is based on the eponymous book (MIT Press,
2004).
-
Date:
Monday, 10/18/2004
Title: Model-guided Mining and Discovery
Speaker: Dr. Rafael Alonso
Abstract: In this talk I will describe the
research done at Sarnoff developing
adaptive models to tailor the behavior of search and
knowledge
discovery systems to the needs of individual users.
I will describe
our development of dynamically adaptive models that
capture user
intent and entity behavior, and can dynamically
adapt by both
expanding model content (with the aid of an
ontology) and by refining
measures of user interest (based on implicit and
explicit user
feedback). I will describe previous and current work
in the context
of several applications and present experimental
evaluation results.
I will also describe how these models are being used
to develop an
anomaly detection system for surveillance
applications, and outline
our recent work in threat assessment algorithms. The
talk will
conclude with a short description of the commercial
applications of
our work.
|
|
|
|
|