PSR paper Designed for problems with reset. Would this also make sense for episodic problems more generally? (That is, the learner can't force a reset, but the environment occassionally returns to an initial state.) There's something wrong with Lemma 1. What are "the linear independent columns of Z"? Is it a maximal such set? What is U in the lemma (are we assuming the environment can be represented by a POMDP?)? We can't quite use rank(i) = rank(i+1) as a stopping criterion in practice, since there is noise that will make the rank appear larger than it is. We probably want to do some sort of scree test to see if the smallest dimensions are actually significant. (I suspect this is what Jaeger referred to in his paper.) I'm surprised they couldn't prove their 2 dimensional version of my 1 dimensional proof... IO-OOM paper What's a learning algorithm for IO-OOMs? Would it imply a learning algorithm for PSRs?