Microscopy imaging instruments are capable of acquiring very large quantities of images. However, manual analyses of terabyte-sized images and insufficient software and computational hardware resources prevent scientists from making new discoveries, increasing statistical confidence of data-driven conclusions, and improving reproducibility of reported results.
The motivation for our work comes from experimental systems for imaging and analyzing human pluripotent stem cell cultures at the spatial and temporal coverages that lead to terabyte-sized image data. The objective of such an unprecedented cell study is to characterize specimens at high statistical significance in order to guide a repeatable growth of high quality stem cell colonies. To pursue this objective, multiple computer and computational science problems have to be overcome including image correction (flat-field, dark current and background), stitching, segmentation, tracking, re-projection, feature extraction, data-driven modeling and then representation of large images for interactive visualization and measurements in a web browser.
I will overview several size, complexity and speed challenges related to big data measurements and shows some current web-based solutions deployed at NIST and outside of NIST since they have enabled new insights in cell biology using TB-sized images. Interactive access to about 3TB of image and image feature data is available at https://isg.nist.gov/deepzoomweb/. I will also discuss open-research problems and possible collaborative opportunities.