Learned Representations in AI

organizer: Michael L. Littman.

Time: Tuesday 10:30-noon
Place: Rutgers, Core A
Semester: Fall 2003


I hosted weekly meetings this fall for people interested in studying the use of machine-learning algorithms for developing representations for use in AI. Each week, we discussed two or three research papers on topics such as corpus analysis, decision making, and similarity metrics.

Sue Dumais alerted me to a relevant NIPS workshop some of us might be interested in attending.

Calendar

Papers

Here is some background about the papers, in no particular order.

Text

On the computational basis of learning and cognition: Arguments from LSA, Landauer. This is much longer than I remembered, but it had a pretty profound effect on me. For learning to apply to real-life problems, we have to make it so that people don't need to label every relevant example. One way to do this is to squeeze much more information out of each labeled example that is available. Landauer's argument inspires us to think of squeezing little bits of information out of a lot of unlabeled examples. This is the data that we have the most of---let's figure out how to use it to create usable representations!

A comparison of statistical models for the extraction of lexical information from text corpora, Dennis. One of the criticisms of the LSA approach is that it doesn't handle syntactic information at all. Dennis proposed the "The Syntagmatic Paradigmatic Model" as a more sophisticated model in the spirit of LSA. Landauer liked it so much, he hired Dennis. Dennis has compiled a list of his favorite models.

A probabilistic approach to semantic representation, Griffiths and Steyvers. Dennis recommends this line of work.

Learning the parts of objects by nonnegative matrix factorization, Lee and Seung. This paper has the intriguing premise that you can do something like LSA, but develop factorizations with only positive values and that this tends to produce representations that resemble a decomposition into parts.

Label and link prediction in relational data, Taskar, Abbeel, Wong, and Koller. Relational learning is a very hot area, which has the promise of narrowing the gap between classical first-order AI representation and machine learning. This is a paper on the topic, although there are many others. I hope this paper could serve as a jumping off point. Getoor and Jensen maintain a relational learning home page.

Learning relational probability trees, Neville, Jensen, Friedland, and Hay. The UMass work in this area is really top notch. It's not exactly learning complex representations, but it does show a way of using a complex representation to learn more effectively. This particular paper was recommended by Amy McGovern of UMass.

Incrementally learning parameters of stochastic context-free grammars using summary statistics, Heeringa and Oates. Oates is interested in creating robotic systems that learn representations of their environments. This paper should provide a brief introduction to language learning.

Random projection in dimensionality reduction: Applications to image and text data, Bingham and Mannila. Haym suggested this one.

Distributional clustering of english words, Pereira, Tishby and Lee. Word clustering, suggested by Ilya.

Distributional clustering of words for text classification, Baker and McCallum. Word clustering, suggested by Ilya.

(Didn't cover this one.) A divisive information-theoretic feature clustering algorithm for text classification, Dhillon and Kumar. Word clustering, suggested by Ilya.

The power of word clusters for text classification, Slonim and Tishby. Word clustering, suggested by Ilya.

(Didn't cover this one.) Concept decompositions for large sparse text data using clustering, Dhillon and Modha. Concept projections, suggested by Ilya.

Fast supervised dimensionality reduction algorithm with applications to document categorization and retrieval, Karypis and Han. Concept projections, suggested by Ilya.

Vector space information retrieval using concept projection, Sasaki and Kita. Concept projections, suggested by Ilya.

Vision/Sensors

An introduction to locally linear embedding, Saul and Roweis. LLE is an approach to finding local low-dimensional structure in high-dimensional data. Prof. Elgammal at Rutgers is using this algorithm as an approach to analyzing visual images and we're trying to apply it to the Aibo robots as well. Saul and Roweis have an LLE home page.

A global geometric framework for nonlinear dimensionality reduction, Tenenbaum, de Silva, Langford. Very similar idea to LLE and is also being used by Prof. Elgammal.

Purposive behavior acquisition for a real robot by vision-based reinforcement learning, Asada, Noda, Umida, Hosoda. This might be the most advanced piece of work on using the traditional state-based representation for reinforcement learning on a real robot. (We might need a photocopy of the paper from the library, as the online version looks awful.)

A method for clustering the experiences of a mobile robot that accords with human judgments, Oates, Schmill and Cohen. I don't know anything about this, but Oates' goal of creating mobile robots that learn about their environments sounds intriguing to me. Seems like the clustering takes place on sensor readings.

Learning and discovery of predictive state representations in dynamical systems with reset, James and Singh. I introduced predictive state representations in a paper with Satinder and Rich Sutton, but we didn't deal with learning issues. This paper sounds like an interesting attempt to do some learning.

Discrete-time, discrete-valued observable operator models: a tutorial, Jaeger. Jaeger has done an amazing job of developing a representation of dynamical systems based on observable data. With some of my collaborators at AT&T, we developed the model into one that represents controlled environments. Jaeger says that his model already covers all of this, so I'd like to take a closer look at his stuff (specifically Section 10.8, pages 65--75).

Automatic image annotation and retrieval using cross-media relevance models, Jeon, Lavrenko and Manmatha. How do we represent visual images for matching, recognition, etc.? This paper gives a probabilistic approach inspired by cross-language information retrieval. Perhaps this can be used in a visually-guided instance-based RL setting.

Learning low-level vision, Freeman, Pasztor, Carmichael. Suggested by Dongsheng ("Oliver") Wang, this appears to be a classic vision paper.

Image analogies, Hertzmann, Jacobs, Oliver, Curless, and Salesin. Peter Turney suggested this paper to me and it looks intriguing!

Multiple Views

(Didn't cover this one.) Robust software via agent-based redundancy, Huhns, Holderfield, and Gutierrez. This paper isn't about learning at all and only tangentially about representation. However, the idea that robust, complex behaviors are best achieved by employing multiple strategies simultaneously is intriguing.

I did some related work on multiple choice exams.

Automatically annotating and integrating spatial datasets, Chen, Thakkar, Knoblock and Shahabi. Great example of using multiple sources of data.

Active learning with strong and weak views: a case study on wrapper induction, Muslea, Minton, Knoblock. I heard on a talk on this and it's a really nice paper that illustrates the importance and value of using multiple approaches to a problem, especially in the context of learning.

Other Topics

(Didn't cover this one.) Decision Region Connectivity Analysis: A method for analyzing high-dimensional classifiers, Ofer Melnik. Ofer suggested this one, which extracts representations from classifiers trained on data.

I'm not sure we should cover these topics, but I'm open to suggestions: graphical games, classifying documents by style, local similarity models ( local LSI), canonical correlation analysis (Thompson?), cognitive perspectives on metaphor and analogy, RL for routers, Pereira's Academy of Science paper on weak evidence, Tony Veale's analogical thesaurus work, hierarchical MDPs, Made Up Minds.

Automatic discovery of subgoals in reinforcement learning using diverse density, McGovern and Barto. Learns in the options model of hierarchical reinforcement learning.

An overview of MAXQ hierarchical reinforcement learning, Dietterich. Summarizes Tom's classic work on a model of hierarchical reinforcement learning. From Nishkam, More on hierarchical RL.