Efficient Singular Value Decomposition via Improved Document Sampling Fan Jiang Department of Computer Science Duke University, Durham, NC 27708-0129 fan@cs.duke.edu Ravi Kannan Department of Computer Science Yale University, New Haven, CT 06511 kannan@cs.yale.edu Michael L. Littman Department of Computer Science Duke University, Durham, NC 27708-0129 mlittman@cs.duke.edu Santosh Vempala Department of Mathematics and Laboratory for Computer Science M.I.T., Cambridge, MA 02138 vempala@math.mit.edu Singular value decomposition (SVD) is a general-purpose mathematical analysis tool that has been used in a variety of information-retrieval applications. As the size and complexity of retrieval collections increase, it is crucial for our analysis tools to scale accordingly. To this end, we have studied the application of a new theoretically justified SVD approximation algorithm to the problem of text retrieval. We show that, in the case of latent semantic indexing, we can achieve near optimal approximations of the exact SVD using considerably less computation by using an appropriate distribution to sample the documents we include in our SVD analysis.