Compressive genomics

Alexander Schliep, Rutgers University

Faculty Host: Dimitris Metaxas

Abstract

High-throughput sequencing (HTS), a technology to unravel genomic sequences on a large scale, is pervasive in clinical and biological applications such as cancer research and basic science, and is expected to gain enormous momentum in future personalized medicine applications.

To address this deluge of data we developed new methods which operate directly on reduced representations of the data and enable the use of advanced statistics even on very large data sets. For identifying Copy Number Variants (CNV) our approach accelerated full Bayesian methods to the point of matching Maximum-likelihood methods.

Typical data sets consist of 2 billions of sequencing reads and more, and large studies might provided hundreds of such data sets. Core steps of the analysis include read error correction, mapping to the reference genome, and identifying genetic variations. We arrive at a reduced representation of HTS data sets by a clustering method able to cluster billions of reads. Adaptations of downstream algorithms operate directly on the clustered representations, thus enabling compressive genomics increasing the fidelity of the analysis at constant or lowered costs.

Go Back to Colloquiua Listing