Time: Tuesday 10:30-noon

Place: Rutgers, Core A

Semester: Fall 2003

I hosted weekly meetings this fall for people interested in studying the use of machine-learning algorithms for developing representations for use in AI. Each week, we discussed two or three research papers on topics such as corpus analysis, decision making, and similarity metrics.

Sue Dumais alerted me to a relevant NIPS workshop some of us might be interested in attending.

- 09/02/03:
*Introduction. What is a representation anyway?*Discussant: Michael Littman.

- 09/09/03:
*Learning spatial representations for text and images, LSA*Landauer (up to and including page 30). Discussants: Kostas Kleisouris. Kostas' slides.

- 09/16/03:
*Corpus learning of semantics and syntax.*Landauer (rest of paper). Discussant: Xiaofeng Mi. Dennis. Discussant: David DeVault. David's slides.

- 09/23/03:
*Corpus learning of syntax and semantics.*Griffiths and Steyvers. Discussant: Nikita Lytkin. Nikita's slides. Lee and Seung. Discussant: Zhipeng Zhao. Zhipeng's slides. (Other relevant events: Homeland Security Conference, Cog Sci welcome lunch.)

- 09/30/03:
*Learning about language and action.*Heeringa and Oates. Discussant: Chengling (Lynn) Chan. Lynn's notes. Oates, Schmill and Cohen. Discussant: Jacek Rawicki. Jacek's notes. Also a paper on Derivative dynamic time warping and clustering.

- 10/07/03:
*Representation by nonlinear embedding.*Saul and Roweis. Discussant: Chan-Su Lee. Chan-Su's slides. Tenenbaum. Discussant: Rong Xu. Rong Xu's slides. Also, we heard a presentation from Ahmed Elgammal on his work with LLE and Isomap in human motion analysis.

- 10/14/03:
*Using images in retrieval and behavior.*Jeon, Lavrenko and Manmatha. Discussant: Carlos Diuk. Carlos' slides. Asada, Noda, Umida, and Hosoda. Discussant: Subarna Sadhukhan. Subarna's slides.

- 10/21/03:
*Low-level feature learning.*Freeman, Pasztor, and Carmichael. Discussant: Rui Huang. Rui's slides. Also in pdf. Hertzmann, Jacobs, Oliver, Curless, and Salesin. Discussant: Dongsheng Wang. Dongsheng's slides.

- 10/28/03:
*Relational learning.*Taskar, Abbeel, Wong, and Koller. Discussant: Yufei Pan. Yufei's slides. Neville, Jensen, Friedland, and Hay. Discussant: Andrew Tjang. Andrew's slides.

- 11/04/03:
*Predictive state representations.*James and Singh. Discussant: Paul Batchis. Paul's slides. Jaeger. (Focus on Section 10.8, pages 65--75.) Discussant: Michael Cole. Michael's slides.

- 11/11/03:
*Projective representations.*Bingham and Mannila. Discussant: Kooksang Moon. Karypis and Han. Discussant: Jason Keller. Jason's slides. Sasaki and Kita. Discussant: Zhiguo Li. Zhiguo's slides. (Michael at DARPA workshop.)

- 11/18/03:
*Learning from word distributions.*Pereira, Tishby and Lee. Discussant: Juan A. Ramos. Juan's slides. Baker and McCallum. Discussant: Tom Walsh. Tom's slides. Slonim and Tishby. Discussant: Yangzhe Xiao. Yangzhe's slides.

- 11/25/03: No meeting: Thursday classes.

- 12/02/03:
*Hierarchical reinforcement learning.*Dietterich. Discussant: Zhi Wei. Zhi's slides. McGovern and Barto. Discussant: Nishkam Ravi. Nishkam's slides. Smart and Kaelbling. Discussant: Dave LeRoux.

Dave's slides (and mountain car example). - 12/09/03:
*Multiple views*, Last meeting. Chen, Thakkar, Knoblock and Shahabi. Discussant: Oncel Tuzel. Oncel's slides. Turney, Littman, Bigham, and Shnayder. Discussant: Timothy Edmunds. Seitz. Discussant: Stephen Max. Stephen's slides.

A comparison of statistical models for the extraction of lexical information from text corpora, Dennis. One of the criticisms of the LSA approach is that it doesn't handle syntactic information at all. Dennis proposed the "The Syntagmatic Paradigmatic Model" as a more sophisticated model in the spirit of LSA. Landauer liked it so much, he hired Dennis. Dennis has compiled a list of his favorite models.

A probabilistic approach to semantic representation, Griffiths and Steyvers. Dennis recommends this line of work.

Learning the parts of objects by nonnegative matrix factorization, Lee and Seung. This paper has the intriguing premise that you can do something like LSA, but develop factorizations with only positive values and that this tends to produce representations that resemble a decomposition into parts.

Label and link prediction in relational data, Taskar, Abbeel, Wong, and Koller. Relational learning is a very hot area, which has the promise of narrowing the gap between classical first-order AI representation and machine learning. This is a paper on the topic, although there are many others. I hope this paper could serve as a jumping off point. Getoor and Jensen maintain a relational learning home page.

Learning relational probability trees, Neville, Jensen, Friedland,
and Hay. The UMass work in this area is really top notch. It's not
exactly learning complex representations, but it does show a way of
*using* a complex representation to learn more effectively.
This particular paper was recommended by Amy McGovern of UMass.

Incrementally learning parameters of stochastic context-free grammars using summary statistics, Heeringa and Oates. Oates is interested in creating robotic systems that learn representations of their environments. This paper should provide a brief introduction to language learning.

Random projection in dimensionality reduction: Applications to image and text data, Bingham and Mannila. Haym suggested this one.

Distributional clustering of english words, Pereira, Tishby and Lee. Word clustering, suggested by Ilya.

Distributional clustering of words for text classification, Baker and McCallum. Word clustering, suggested by Ilya.

(Didn't cover this one.) A divisive information-theoretic feature clustering algorithm for text classification, Dhillon and Kumar. Word clustering, suggested by Ilya.

The power of word clusters for text classification, Slonim and Tishby. Word clustering, suggested by Ilya.

(Didn't cover this one.) Concept decompositions for large sparse text data using clustering, Dhillon and Modha. Concept projections, suggested by Ilya.

Fast supervised dimensionality reduction algorithm with applications to document categorization and retrieval, Karypis and Han. Concept projections, suggested by Ilya.

Vector space information retrieval using concept projection, Sasaki and Kita. Concept projections, suggested by Ilya.

A global geometric framework for nonlinear dimensionality reduction, Tenenbaum, de Silva, Langford. Very similar idea to LLE and is also being used by Prof. Elgammal.

Purposive behavior acquisition for a real robot by vision-based reinforcement learning, Asada, Noda, Umida, Hosoda. This might be the most advanced piece of work on using the traditional state-based representation for reinforcement learning on a real robot. (We might need a photocopy of the paper from the library, as the online version looks awful.)

A method for clustering the experiences of a mobile robot that accords with human judgments, Oates, Schmill and Cohen. I don't know anything about this, but Oates' goal of creating mobile robots that learn about their environments sounds intriguing to me. Seems like the clustering takes place on sensor readings.

Learning and discovery of predictive state representations in dynamical systems with reset, James and Singh. I introduced predictive state representations in a paper with Satinder and Rich Sutton, but we didn't deal with learning issues. This paper sounds like an interesting attempt to do some learning.

Discrete-time, discrete-valued observable operator models: a tutorial, Jaeger. Jaeger has done an amazing job of developing a representation of dynamical systems based on observable data. With some of my collaborators at AT&T, we developed the model into one that represents controlled environments. Jaeger says that his model already covers all of this, so I'd like to take a closer look at his stuff (specifically Section 10.8, pages 65--75).

Automatic image annotation and retrieval using cross-media relevance models, Jeon, Lavrenko and Manmatha. How do we represent visual images for matching, recognition, etc.? This paper gives a probabilistic approach inspired by cross-language information retrieval. Perhaps this can be used in a visually-guided instance-based RL setting.

Learning low-level vision, Freeman, Pasztor, Carmichael. Suggested by Dongsheng ("Oliver") Wang, this appears to be a classic vision paper.

Image analogies, Hertzmann, Jacobs, Oliver, Curless, and Salesin. Peter Turney suggested this paper to me and it looks intriguing!

I did some related work on multiple choice exams.

Automatically annotating and integrating spatial datasets, Chen, Thakkar, Knoblock and Shahabi. Great example of using multiple sources of data.

Active learning with strong and weak views: a case study on wrapper induction, Muslea, Minton, Knoblock. I heard on a talk on this and it's a really nice paper that illustrates the importance and value of using multiple approaches to a problem, especially in the context of learning.

I'm not sure we should cover these topics, but I'm open to suggestions: graphical games, classifying documents by style, local similarity models ( local LSI), canonical correlation analysis (Thompson?), cognitive perspectives on metaphor and analogy, RL for routers, Pereira's Academy of Science paper on weak evidence, Tony Veale's analogical thesaurus work, hierarchical MDPs, Made Up Minds.

Automatic discovery of subgoals in reinforcement learning using diverse density, McGovern and Barto. Learns in the options model of hierarchical reinforcement learning.

An overview of MAXQ hierarchical reinforcement learning, Dietterich. Summarizes Tom's classic work on a model of hierarchical reinforcement learning. From Nishkam, More on hierarchical RL.