Skip to content Skip to navigation

Introduction to Data Science


This course covers topics needed to solve problems involving data, which includes preparation (collection and integration), characterization and presentation (information visualization), analysis (machine learning and data mining), and products (applications).

- A grade below a "C" in a prerequisite course will not satisfy that prerequisite requirement.

- Data visualization

- Data wrangling and pre-processing

- Map-reduce and the new software stack

- Data mining: finding similar items, mining data streams, frequent itemsets, link analysis, mining graph data

- Machine learning: k nearest neighbor, decision trees, naive Bayes, regression, ensemble methods, support vector machines, k-means, spectral clustering, hierarchical clustering, dimensionality reduction, evaluation techniques 

- Applications: recommendation systems, advertising on the Web

Course Material: 

"Mining of Massive Datasets" by Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman; 2nd Edition (December 2014), Cambridge University Press.  Available free online at <>.

Expected Work: 

Homework assignments and a semester-long project

Midterm and final exams
Learning Goals: 
Computer Science majors ...
  • will be prepared to contribute to a rapidly changing field by acquiring a thorough grounding in the core principles and foundations of computer science (e.g., techniques of program design, creation, and testing; key aspects of computer hardware; algorithmic principles).
  • will acquire a deeper understanding on (elective) topics of more specialized interest, and be able to critically review, assess, and communicate current developments in the field.
  • will be prepared for the next step in their careers, for example, by having done a research project (for those headed to graduate school), a programming project (for those going into the software industry), or some sort of business plan (for those going into startups).