Course Details

  • Course Number: 16:198:555
  • Course Type: Graduate
  • Semester 1: Spring
  • Credits: 3
  • Description:

    This is the ssecond class in a sequence. It is a hands on approach, for students in the MSDS professional program, to build working systems that process data at scale.  The systems built will be evaluated based on three criteria: Feasibility, Usefulness and Novelty. A panel formed by industry professionals and faculty, will judge the completed projects. Best project awards will be given and their creators will be invited to present them in an annual Industry-Faculty gala ceremony. The first capstone class focus on Interface development and the second concentrates on large scale analysis and sense making.

    Course Material: 


    The electronic version of this book is available from the Eurographics Digital Library at

    Sample Project Categories (A Non Exhaustive Guide)

      1. Similarity Search, Recommender Systems and Collaborative Filtering
      2. Data Retrieval and Topic Waves
      3. Prediction and Verification
      4. Transaction Driven Data
      5. Medical Imaging
      6. Computer Aided Manufacturing
      7. Apps for: Data Erasure, Nameless File Systems, Password Boxes, Phonetic Lyrics Search
      8. Sentence Completion, Digital Signatures, …

    Selected papers from the literature on

      – Algorithmic Analytics, Visualization and Computer Human Interaction.

     – Data Ethics, Privacy, Security, Sharing, Provenance

  • Prerequisite Information:

    Languages:  C/C++, Java, JavaScript, Python

    16:198:55416:198:543 (Massive Data Storage and Retrieval), 16:198:550(Massive Data Mining)/16:198:535(Pattern Recognition) or 16:198:550(Massive Data Mining)/16:198:536(Machine Learning) , or 16:198:535/16:198:536(Pattern Recognition-Machine Learning)

  • Topics:

    This is a specialized second class in a project sequence in Data Science organized in five Phases described below. The class builds on knowledge acquired in two other novel Data Science classes, namely:

    (CS526: DIVA) Data Interaction - Visual Analytics and (CS543: MSR) Massive Data Storage and Retrieval.

    Projects can be chosen from an evolving Capstone Faculty Project List or can be submitted for Approval by the Capstone Faculty Committee. *Projects will be judged by a faculty panel and interested industry sponsors.  

    Guiding evaluation principles will be: the “value” of the extracted information from the chosen data set, the methods and models used, and the final application Interactivity.


    Phase 1(Background - Week 1 and 2). During the first two weeks students will be exposed to the fundamental principles of Data Analytics and Visual interaction as described in the VisMaster-book.

    Phase 2(FUN Project Selection - Week 3). During week 3, each student will present to the class and faculty a project conceptualization with a feasible plan of completion.  A subset of the projects will be selected for continuation. Those students whose projects get selected will become the project leaders. Student Leaders will select two non-leader partners that are willing to commit to the successful project completion. Faculty will choose those Feasible, Useful, and  Novel projects they are willing to supervise and sponsor.

    Phase 3(Project Prototype Progress Report- Week  4 -5 -6).  Development  and Evaluation of an operational Prototype. Faculty will evaluate project progress and assess the feasibility of project completion by the end of the semester. Only those projects judged as feasible will be allowed to move forward.  Those students whose projects do not get selected will be assigned to become testers and project writers of those projects moving forward.

    Phase 4(Pre Final Defense- Week 7, 8, 9, 10) Faculty sponsors will meet alternatively every week with sponsored projects to monitor their progress.  During week 10 each project will have a pre final defense presentation.  One third of the projects will be chosen for a final gala presentation (on week 14) in front of the class and a faculty/industry jury.  

    Phase 5(Gala Presentation- Week 14) The best five projects will be selected, project MSDS awards will be distributed, and formally recorded in the planned MSCS Wall of Fame.

  • Learning Goals:

    Objective: Expose and train students in all the facets of building scalable data processing systems

    The goal is to identify questions that can be aided by analytic tools that amplify users understanding of data findings and to build the corresponding computer-human systems that aid human data synthesis.

    Data Sets Students will choose a Data set of their interest and devise representation methods that are conducive to efficient algorithmic exploration, Interactive Visualization, Analysis, summarization and sense making.  The data sets used may be real or artificial.  Some typical data sets that may be considered include:  data feeds from Tweeter, YouTube, news streams, stocks, financial transactions, joke collections, movies, songs, Image Repositories, Transaction Ledgers, Online Encyclopedias (Ex: OEIS, Algorithm  and Software repositories ), transportation schedules, data analytics blogs, funding agencies, startups, computer science educational materials, internet of things,

  • Notes:

    Guiding evaluation principles will be: the “value” of the extracted information from a variety of data sets at different scales, the methods and models used, interface interactivity, and the final application usefulness. 


      1. Bi-Weekly Write ups 15%
      2. Midterm Exam or equivalent 15%
      3. Mid Project Draft 20%
      4. Final Project Demo 50% –  Evaluated by a Faculty-Industry Panel