Here is a partial list of approaches and papers we've been looking at:

- Independent component analysis (ICA): Restructuring sparse high dimensional data for effective retrieval, Independent components in text [Survey on independent components analysis]
- Information bottleneck (IB): Document clustering using word clusters via the information bottleneck method, Unsupervised document classification using sequential information maximization (pdf), The information bottleneck method. [Tutorial slides].
- Locally Linear Embedding (LLE): Nonlinear Dimensionality Reduction by Locally Linear Embedding
- Nonnegative matrix factorization (NMF): Learning the parts of objects by nonnegative matrix factorization (pdf)
- Clustering: An impossibility theorem for clustering
- Local LSI (LLSI): A comparison of classifiers and document representations for the routing problem

- Self-organizing maps (SOM)
- Singular value decomposition (SVD): (Lillian Lee paper?)
- Principle Component Analysis (PCA): (Newby?)
- Probabilistic latent semantic indexing (PLSI): (Hoffmann)
- A method by Ilya Muchnik
- neural net autoassociation?

- How pick dimensionality?
- Why do it? (ease later processing, computational performance improvement, noise reduction, performance improvement, recover underlying causes)

Organizer: Michael L. Littman