Dimension Reduction for Text
We (Michael Littman, Charles Isbell, Haym Hirsh, possibly Ilya
Muchnik, Parry Husbands) are surveying approaches to using dimension
reduction to create similarity metrics for text. Text retrieval and
text classification are our
Here is a partial list of approaches and papers we've been looking at:
Coming up:
- Self-organizing maps (SOM)
- Singular value decomposition (SVD): (Lillian Lee paper?)
- Principle Component Analysis (PCA): (Newby?)
- Probabilistic latent semantic indexing (PLSI): (Hoffmann)
- A method by Ilya Muchnik
- neural net autoassociation?
Questions:
- How pick dimensionality?
- Why do it? (ease later processing, computational performance
improvement, noise reduction, performance improvement, recover
underlying causes)
Organizer: Michael
L. Littman