Number: 16:198:672
Time: Thursdays 3:20-6:20
Room: Hill 260 (BioMaPS seminar room) (Note: room changed)
Instructor: Alexander Schliep
Email: schliep@cs.rutgers.edu. Please use F09-672 in subject.
Office Hours: Mondays 2-4pm and by appointment.
Hidden Markov Models (HMMs) are an important class of stochastic models which were first applied and popularized in the context of automatic speech recognition by Rutgers' distinguished faculty member Lawrence Rabiner and his colleagues at Bell labs.

In recent years they found wide-spread use in analysis of data from molecular biology. They constitute the state-of-the-art in searching for remote homolog protein sequences and, with mild extensions, in identifying genes (or other signals) in DNA sequences. In addition to these stochastic models of sequences of discrete symbols, continuous-valued sequences can be modeled. These arise for example from time-course experiments measuring gene expression during the cell-cycle, in response to stimuli or during the development of organisms. Similarly the analysis of genomic tiling array data for chromosomal aberrations, location of transcription factor binding sites or identification of expressed regions is a typical segmentation problem for which HMMs are very well suited.
The popularity of HMMs is based on their ease in terms of creating complex models, their simple stochastic structure and their polynomial-time algorithms for all relevant operations. Nevertheless, there have been exciting recent developments to arrive at reasonable running time and memory requirements even for genome-sized data sets. Similarly, theoretical aspects of HMMs and their estimation or training are still an active area of research.
In this course we will introduce the necessary theory, the relevant algorithmic developments, and, through hands-on projects using the GHMM (http://ghmm.org), some of the engineering aspects of solving computational biology problems with HMMs. An emphasis will be put on recent developments in the field.
For CS students: note that HMMs are applied to a wide range of non-biological problems from fault detection in computer systems, over handwriting recognition, to predicting crises in the middle East based on newsfeed-data.
Attendance and participation is expected and will count toward the grade. Other components are problem sets which will have to be handed in, class project(s) and a preparation of a term presentation based on an original research paper.
Review of elementary probability and information theory, Markov chains, positional weight matrices and sequence logos, Hidden Markov Models, forward/backward-algorithm, Viterbi and posterior decoding, Baum-Welch training (i.e., the Expectation Maximization algorithm), pair-wise sequence alignments, a probabilistic interpretation of alignment scores, analyzing DNA sequences: gene finding with labelled HMMs and k-best decoding, finding transcription factors binding sites, detecting remote homolog protein sequences with profile HMMs, Dirichlet priors, HMMs for continous-valued observations, analyzing gene expression time-courses with HMMs, mixture models, analyzing tiling DNA micorarray data.
Depending on interests and class composition further topics include: distance functions between HMMs, identifyability, learning HMM topology, numerically stable implementations of the basic algorithms, efficient MCMC approaches for full Bayesian analysis with HMMs, memory-efficient versions of the Viterbi, algorithmic improvements for repetitive sequences.
The course will be based on class notes, individual chapters of (a) textbook and original papers.
See the Sakai Wiki for the schedule.