Upcoming Talks:

 

Past Talks:
  • Date: 12/6/2007
    Title:On a network creation game
    Speaker: Yishay Mansour , Google & Tel Aviv University
    Abstract: A network creation game abstracts a network construction by selfish agent who build a network by buying links to each other. Each player pays a fixed cost per link, and suffers an additional communication cost (which is the sum of distances to the it is interested in communicating with players). The uniform model (where a player is interested to the distances to all other players) was first proposed in Fabricant et al 2003, and we will also consider a non-uniform version of the game.
    The talk will focus on understanding the equilibrium structure in a network creation games. First, we will show that a pure Nash Equilibrium exists (and for the uniform model, even a strong equilibrium exists). Our main focus would be on the quality of the resulting Nash equilibrium, as modeled by the Price of Anarchy. We will bound worse case ratio of the cost of some Nash equilibrium to that of the social optimum.

  • Date: 11/16/2007
    Title: Linear Prioritized Sweeping
    Speaker: Alborz Geramifard, Alberta University
    Abstract: We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending Sutton's Dyna architecture to linear function approximation. Dyna-style planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. we also introduce the first version of linear prioritized sweeping (LPS). In the control setting, we found empirically on the Mountain Car and Boyan Chain problems that LPS performed better than Sarsa.

  • Date: 11/15/2007
    Title: Continuous-State POMDPs with Hybrid Dynamics
    Speaker: Emma Brunskill, MIT
    Abstract: Continuous-state POMDPs provide a natural representation for a variety of tasks, including many in robotics. However, existing continuous-state POMDP approaches are limited by their reliance on a single linear model to represent the world dynamics. I will introduce a new switching-state (hybrid) dynamics model that can represent multi-modal state-dependent dynamics. The talk will present a new point-based POMDP planning algorithm for solving continuous-state POMDPs using this dynamics model, and also provide a constrained optimization approach for approximating the value function as a mixture of a bounded number of Gaussians. I will present results on two example simulation problems and demonstrate that when different degrees of state accuracy are needed to accomplish a task, our hybrid continuous-state approach outperforms a standard discrete state technique. I will also briefly mention my current work on continuous state problems.
    This is joint work with Nicholas Roy, Leslie Kaelbling and Tomas Lozano-Perez.

  • Date: 7/6/2007
    Title: A Direct, Bayesian Approach to Continuous Global Function Optimization
    Speaker
    : Kevin Seppi, Brigham Young University
    Abstract: While continuous global optimization is often successfully tackled by such popular approaches as simulated annealing, genetic algorithms, and particle swarm optimization, these methodologies make sampling decisions based mostly on heuristics and intuition. Therefore, the very way in which these algorithms are designed makes it difficult to analyze them, to improve them, or to incorporate additional information about the target function.

    The focus of this presentation is a Bayesian and decision-theoretic model of global optimization that is at once easy to understand, possible to rigorously analyze, reasonable to use, and infuriatingly consistent with No Free Lunch Theorems for Optimization.


     

  • Date: 7/30/2007
    Title: Two Methods for Learning Manifold Controllers
    Speaker: Tom Erez, Washington University of St. Louis
    Abstract: The curse of dimensionality plagues the application of reinforcement learning (RL) in continuous, nonlinear domains (such as robot walking): in problems with more than 10 dimensions, it is almost impossible to approximate the optimal policy over the entire volume of state space. In certain tasks, it is possible to represent the policy only along certain lower-dimensional manifolds embedded in state space, a technique we call Manifold Control. In this talk, I will present its usefulness in high-dimensional periodic tasks such as robot walking and swimming, and discuss two different methods for learning manifold control from a scalar reward signal. As opposed to classic RL methods (such as Q-learning), the computational complexity of both methods is polynomial in the domain's dimensionality, rather than exponential.

    The demos I discuss are: A simulation of a multilink swimmer (of up to 34 state dimensions and 15 action dimensions) that learned to bring its nose to a moving goal position: [watch]  Two simulations of a 10D kneed walker walking on stilts and on uneven terrain: [watch]
     

  • Date: 7/6/2007
    Title: A Direct, Bayesian Approach to Continuous Global Function Optimization
    Speaker
    : Kevin Seppi, Brigham Young University
    Abstract: While continuous global optimization is often successfully tackled by such popular approaches as simulated annealing, genetic algorithms, and particle swarm optimization, these methodologies make sampling decisions based mostly on heuristics and intuition. Therefore, the very way in which these algorithms are designed makes it difficult to analyze them, to improve them, or to incorporate additional information about the target function.

    The focus of this presentation is a Bayesian and decision-theoretic model of global optimization that is at once easy to understand, possible to rigorously analyze, reasonable to use, and infuriatingly consistent with No Free Lunch Theorems for Optimization.


     

  • Date: 4/13/2007
    Title: Agnostic Active Learning
    Speaker: John Langford, Yahoo! Research
    Abstract: Given only an assumption of IID samples, standard batch learning requires O(1/e^2) labeled samples to find an e-optimal hypothesis within a set of hypotheses. The fundamental promise of active learning is that with only O(log(1/e)) carefully chosen labeled samples, we can make the same guarantee. Several examples have been worked out showing that for specific (learning problem, hypothesis space) pairs, this dependence is possible. One frustrating aspect of active learning, is that it tended to require noise-free (or at least effective noise free) learning problems. I'll present an algorithm A^2 (Agnostic Active) which removes this constraint. Under an assumption of only IID samples, A^2 has the following guarantees:
    - For every (learning problem, hypothesis space) pair, the returned hypothesis is e-optimal with high probability.
    - For every (learning problem, hypothesis space) pair, the number of labeled samples required is no worse than batch learning.
    - For known pairs where O(log(1/e)) labeled examples are required without noise, we also achieve an O(log(1/e)) dependence.
     

  • Date: 2/28/2007
    Title: Modeling Science: Topic models of Scientific Journals and Other Large Text Databases
    Speaker: David Blei, Princeton University
    Abstract: A surge of recent research in machine learning and statistics has developed new techniques for finding patterns of words in document collections using hierarchical probabilistic models. These models are called "topic models" because the word patterns often reflect the underlying topics that are combined to form the documents; however topic models also naturally apply to such data as images and biological sequences.
    After reviewing the basics of topic modeling, I will describe two related lines of research in this field, which extend the current state of the art.
    First, while previous topic models have assumed that the corpus is static, many document collections actually change over time: scientific articles, emails, and search queries reflect evolving content, and it is important to model the corresponding evolution of the underlying topics. For example, an article about biology in 1885 will exhibit significantly different word frequencies than one in 2005. I will describe probabilistic models designed to capture the dynamics of topics as they evolve over time.
    Second, previous models have assumed that the occurrence of the different latent topics are independent. In many document collections, the presence of a topic may be correlated with the presence of another. For example, a document about sports is more likely to also be about health than international finance. I will describe a probabilistic topic model which can capture such correlations between the hidden topics.
    In addition to giving quantitative, predictive models of a corpus, topic models provide a qualitative window into the structure of a large document collection. This perspective allows a user to explore a corpus in a topic-guided fashion. We demonstrate the capabilities of these new models on the archives of the journal Science, founded in 1880 by Thomas Edison. Our models are built on the noisy text from JSTOR, an online scholarly journal archive, resulting from an optical character recognition engine run over the original bound journals.

  • Date: 2/13/2007
    Title: When success is central in stochastic models
    Speaker: Judy Goldsmith, University of Kentucky
    Abstract: I will discuss an ongoing research project to build decision-support software for advising, and how anthropologists and welfare case managers helped us create a new knowledge representation formalism. I will give brief introductions to the US "welfare-to-work" system; Markov decision processes with action results; "bowtie fragments" of dynamic Bayesian networks; a knowledge elicitation method for bowtie fragments, and the social interactions that led to bowtie fragments. I will tie all of this up with a brief discussion of the Social Construction of Technology theory.

  • Date: 4/5/2006
    Title: Utility based Data Mining
    Speaker: Gary Weiss
    Abstract: Early work in predictive data mining did not address the complex circumstances in which models are built and applied. It was    assumed that a fixed amount of training data were available-- at no cost-- and only simple objectives, namely predictive accuracy, were considered. Over time it became clear that these assumptions were unrealistic and that the economic utility of acquiring training data, building a model, and applying the model had to be considered. This led to work on active learning and cost-sensitive learning. However, most of this work only factored in utility considerations in one stage of the data mining process (i.e., data acquisition, model building, or applying the model). Recently, several colleagues and I proposed the term utility-based data mining (UBDM) to encompass all work that considers utility during the data mining process. The purpose of introducing this new term is to encourage researchers and practitioners to focus their attention on maximizing utility throughout the entire data mining process. In this talk I will give a brief overview of utility-based data mining, discuss why it is important, and review how existing research fits into this framework. I will also discuss what UBDM has to offer to reinforcement learning and vice-versa.
     
    For those who are interested, papers from the First International Workshop on Utility-Based Data Mining are available on-line.

  • Date: 3/29/2006
    Title: Learning and Planning in POMDPs
    Speaker: Eyal Even-Dar
    Abstract:
    We consider the most realistic reinforcement learning setting in which an agent starts in an unknown environment (the POMDP) and must follow one continuous and uninterrupted chain of experience with no access to ``resets'' or ``offline'' simulation. We provide algorithms for general POMDPs that obtain near optimal average reward. One algorithm we present has a convergence rate which depends exponentially on a certain horizon time of an optimal policy, but has no dependence on the number of (unobservable) states. The main building block of our algorithms is an implementation of an approximate reset strategy, which we show always exists in every POMDP.
    We also relate POMDPs to multiplicity automata -- showing that POMDPs can be represented by multiplicity automata with no increase in the representation size. Furthermore, we show that the size of the multiplicity automaton is equal to the rank of the predictive state representation. Therefore, we relate both the predictive state representation and POMDPs to the well-founded multiplicity automata literature.Based on the multiplicity automata representation, we provide a planning algorithm which is exponential only in the multiplicity automata rank rather than the number of states of the POMDP.

  • Date: 3/22/2006
    Title: Privacy Unmasked: the Separation of Information from Effect
    Speaker: Scott Stornetta (Equipoise)
    Abstract:
    Privacy issues have a large overlap with work in computer science, from security to social networking, spam to government requests for search engine information to a consumer's purchase history at Amazon. At the same time, there is often a sense that protecting one's privacy is a gradually losing battle, as information about individuals becomes more and more widespread.

    We suggest a different framework for the privacy question, which separates the information at stake from its unwanted effect. Viewed in this light, one can recast many of today's privacy problems in a way that suggests win-win solutions. We'll illustrate this theory with specific examples for new initiatives in networking, software, and online commerce.

  • Date: 11/16/2005
    Title: Programmable Reinforcement Learning Agents
    Speaker: David Andre
    Abstract:
    In this talk I will discuss the use of partial programming to create a
    system for agent design that allows the programmer to specify what they know, hint at what they suspect using shaping, and leave unspecified that which they don't know; the system then optimally completes the program through experience and takes advantage of the hierarchical structure of the specified program to speed learning. Specifically, I will present the ALisp system, a Lisp-based high-level partial programming language that allows the programmer to express his or her prior knowledge in a concise manner and to constrain the set of policies considered by the learning subsystem. Online learning algorithms for ALisp can be proved to converge to an optimal solution of the SMDP and thus to an optimal completion of the partial program. I will also present methods for exploiting the modularity of ALisp to speed the learning process using state abstraction. Empirical and theoretical results on ALisp with state abstraction will be presented, including the application of ALisp to a difficult continuous version of the Taxi problem. A thorny theoretical issue with respect to proving online convergence with state abstraction will be also discussed.

  • Date: 11/15/2005
    Title: Winning the DARPA Grand Challenge
    Speaker: Sebastian Thrun
    Abstract: The DARPA Grand Challenge has been the most significant challenge to the robotics community in more than a decade. The task was to build an autonomous robot capable of traversing 132 miles of unrehearsed desert terrain in less than 10 hours. In 2004, the best robot only made 7.3 miles. In 2005, Stanford won the challenge and the $2M prize money by successfully traversing the course in less than 7 hours. This talk, delivered by the leader of the Stanford Racing Team, will provide insights into the software architecture of Stanford's winning robot. The robot massively relied on machine learning and probabilistic modeling for sensor interpretation and control. The speaker will explain some of the basic algorithms that made this victory possible, and share some of the excitement characterizing this historic event.

  • Date: 11/2/2005
    Title: Probabilistic Policy Reuse in Reinforcement Learning
    Speaker: Fernando Fernandez Rebollo
    Abstract:
    We contribute Policy Reuse as a technique to improve a reinforcement learner with guidance from past learned similar policies. Our method relies on using the past policies in a novel way as a probabilistic bias where the learner faces three choices: the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. We introduce the algorithm and its major components: an exploration strategy to include the new reuse bias, and a similarity metric to estimate the similarity of past policies with respect to a new one. We provide empirical results demonstrating that Policy Reuse improves the learning performance over different strategies that learn without reuse. Policy Reuse further contributes the learning of the structure of a domain. Interestingly and almost as a side effect, Policy Reuse identifies classes of similar policies revealing a basis of ”eigen-policies” of the domain. In general, Policy Reuse contributes to the overall goal of lifelong reinforcement learning, as (i) it incrementally builds a policy library; (ii) it provides a mechanism to reuse past policies; and (iii) it learns an abstract domain structure in terms of eigen-policies of the domain. This is a joint work with Prof. Manuela Veloso.
     

  • Date: 5/7/2005
    Title: When the past meets the future: Reaction and expectation on interval
    schedules of reinforcement
    Speaker:
    Elliot Ludvig
    Abstract:
    For most animals, rewarding stimuli exert multiple influences on behaviour. Rewards can selectively enhance actions (operant conditioning), change the value and salience of neutral stimuli (classical conditioning), and alter immediate motivational and affective states. On interval schedules of reinforcement, rewards show periodicity, and animals will generally time their responses to coincide with food availability. The first part of this talk presents results from a series of empirical studies with rats and pigeons that elucidate the mechanisms through which animals respond to dynamically-changing sequences of intervals. The latter portion explores how magnitude of reinforcement changes timing, drawing on results from a study using Brain Stimulation Reward (BSR) in rats. Both reward magnitude and interval duration produce their effects on timed responses through a combination of unlearned, immediate after-effects and a learned expectation of upcoming rewards. I conclude with the suggestion that reinforcement learning algorithms may benefit from the incorporation of both these aspects of rewards.

 

  • Date: 2/7/2005
    Title: Annotating Clustering Constraints with Feature Relevance Information
    Speaker: Dr.
    Marie desJardins
    Abstract: Constrained clustering uses membership constraints between pairs of data points to improve the performance of clustering algorithms [2].
    Previous work in this area has focused on two classes of binary constraints:
    MUST-LINK constraints (which indicate that two data points should be placed in the same cluster) and CANNOT-LINK constraints (which indicate that two data points should be placed in different clusters). One recent constrained clustering algorithm, MPCK-MEANS [2], integrates such constraints with a metric learning approach, yielding very good performance in a variety of domains. In this talk, I will describe our ongoing research to extend MPCK-MEANS by annotating the constraints with information about feature relevance. Specifically, each constraint may include a feature vector, indicating the degree to which a user (or oracle) believes that a particular feature is important for generating the MUST-LINK or CANNOT-LINK constraint that is associated with that pair of data points. I will present a method for automatically generating feature annotations (simulating a domain expert), and will describe our initial experimental results, which show that feature annotations can improve clustering performance for a given number of constraints. [1] Mikhail Bilenko, Sugato Basu, and Raymond J. Mooney, "Integrating constraints and metric learning in semi-supervised clusetring."
    In Proceedings of the 21st International Conference on Machine Learning (ICML-2004), pp. 81-88, Banff, Canada, July 2004. [2] Kiri Wagstaff, "Intelligent Clustering with Instance-Level Constraints." Cornell University Computer Science Ph.D. dissertation, 2002. Bio:Dr. Marie desJardins is an assistant professor in the Department of Computer Science and Electrical Engineering at the University of Maryland, Baltimore County.
    Prior to joining the faculty in 2001, Dr. desJardins was a senior computer scientist at SRI International in Menlo Park, California. Her research is in artificial intelligence, focusing on the areas of machine learning, multi-agent systems, planning, interactive AI techniques, information management, reasoning with uncertainty, and decision theory.

 

  • Date: Monday, December 13, 2004
    Title: Creating Intrinsic Value Systems for doing Reinforcement Learning in
    Developmental Robotics
    Speaker: Dr. Lisa Meeden
    Abstract:  Developmental robotics is a move away from task-specific design where a robot is programmed to accomplish a particular pre-defined goal and instead explores the kinds of capabilities that a robot can discover through  self-motivated actions based on its own body and the dynamic structure of its environment. I will review the ways in which intrinsic value systems have been used to implement reinforcement learning so as to induce self-motivated actions. Then I will describe our own approach that is based on the competition between two innate pressures within the developing robot:
    the need to accurately predict the environment while simultaneously trying
    to seek out novelty in the environment.

    This work has been done in collaboration with Douglas Blank and Deepak Kumar at Bryn Mawr College, and James Marshall at Pomona College.
     

 

  • Date: Monday, 11/29/2004
    Title: What is Thought?
    Speaker: Eric B. Baum
    Abstract: What is thought, and how can it arise from a material brain, which in principle could be simulated by an electronic computer? How are understanding and consciousness produced by physical processes? How exactly do nature and nurture interact? What is language, why did it take 4 billion years to evolve, and how are words related to meaning? What is meaning? What are will and awareness?
    A model is presented that addresses these questions in a concrete and principled way. The computational learning literature has explained concept learning as stemming from a formalized Occam's razor: a compact program (Occam explanation) consistent with enough examples of a concept yields generalization to new examples. These results are extrapolated to the conjecture that a compact enough program that solves enough reinforcement learning trials exploits the underlying structure of its world and will generalize to solve new problems. The ramifications of this conjecture are discussed and shown to answer the above questions in a way that brings to bear results of complexity theory, is consistent with data from a variety of fields, and makes empirical predictions.

    As an illustrative example, results are presented from experiments in which the evolution of an economy of programs is simulated. Two principles, "conservation of money" and "property rights" are imposed on the economy. The economy is seen to evolve, starting from random code, to solve a collection of reinforcement learning examples. As predicted by the extended Occam's razor, the economy then generalizes to solve new problems. But moreover, these experiments provide evidence illuminating how and when a social system can organize itself to take appropriate decisions.

    This talk is based on the eponymous book (MIT Press, 2004).
     

  • Date: Monday, 10/18/2004
    Title: Model-guided Mining and Discovery
    Speaker: Dr. Rafael Alonso
    Abstract: In this talk I will describe the research done at Sarnoff developing adaptive models to tailor the behavior of search and knowledge discovery systems to the needs of individual users. I will describe our development of dynamically adaptive models that capture user intent and entity behavior, and can dynamically adapt by both expanding model content (with the aid of an ontology) and by refining measures of user interest (based on implicit and explicit user feedback). I will describe previous and current work in the context of several applications and present experimental evaluation results.
    I will also describe how these models are being used to develop an anomaly detection system for surveillance applications, and outline our recent work in threat assessment algorithms. The talk will conclude with a short description of the commercial applications of our work.