• Course Number: 01:198:310
  • Instructor: STAFF
  • Course Type: Undergraduate
  • Credits: 1
  • Description:

    This is a 1-credit capstone course for the Data Science Certificate Program. Students must already have completed the three core courses in the Certificate program, and must have either completed or be concurrently enrolled in a domain course for the Certificate in order to register for this course. The course is meant to be a largely independent empirical project which makes concrete use of all aspects of data science explored in the core and possibly in domain course, from identifying a data source, cleaning and organizing the data, conducting appropriate statistical analysis, to interpreting and reporting the results of the study in a standard scholarly form. Finding an appropriate project and data source with suitable characteristics for a data science-oriented study is an important component of the course.

    For example https://ourworldindata.org/policy-responses-covid, with over 40 attributes and nearly 50,000 records could provide a good basis for data analysis, prediction, hypothesis testing. This can lead to exciting data driven presentation.

    Students can find/propose data sets (subject to approval by the instructor) or work with default data set provided by instructor.

    Students will have to demonstrate skills learned in DS certificate classes such as data wrangling, statistical data analysis including application of machine learning methods and database management skills, plotting and visualization as well as telling the story with the data.

  • Prerequisite Information:

    Prerequisites: 01:198:142/01:960:142 (Data 101/Data Literacy), 01:960:291 (Statistical Inference for Data Science) and one of the following: 01:198:210 (Data Management for Data Science), 01:960:295 (Data management and wrangling with R), or 04:547:221 (SCI Data management course).

  • Expected Work: The class will meet once a week for 80 minutes during the last 10 weeks of the semester (this is a 1-credit course, so this is the appropriate number of class meetings). Most class periods will consist of progress reports and updates by students on the various stages of their projects. Ideally an appropriate topic and at least one feasible data source will have been identified by the first class meeting, so that progress in transforming the data into a usable form, preliminary visualization and summary “sizing up” of the nature of the data, appropriate statistical analyses, and so forth, many commence in short order. Final presentations of projects will be done in the last few weeks of the class, and the final report on the project will be due on the scheduled final exam date for the class. There will be no exams, but completion of necessary progress at regular checkpoints during the semester will be required.
  • Exams: Written report plus final presentation of results.
  • Learning Goals:

    The learning objectives of the course correspond to the broadly stated learning goals for the Certificate in Data Science. The completed project should exhibit an overall synthesis of these learning goals for a particular research topic:

    Students will be able to 

    1. Visualize (plot) data relationships in meaningful ways.

    2. Transform and map data (otherwise called data wrangling) from one "raw" data form into another format for further analysis, querying, learning, and prediction.

    3. Acquire data management skills such as database design and database querying.

    4. Execute data analyses with professional statistical and machine learning software.

    5. Understand the conceptual basis of analyses used in data science.

    6. Apply data science concepts and methods to solve real-world problems.

    7. Back up a story with data as well as critique conclusions which are not justified by the data.