This course covers topics needed to solve problems involving data, which includes preparation (collection and integration), characterization and presentation (information visualization), analysis (machine learning and data mining), and products (applications).
- Data visualization
- Data wrangling and pre-processing
- Map-reduce and the new software stack
- Data mining: finding similar items, mining data streams, frequent itemsets, link analysis, mining graph data
- Machine learning: k nearest neighbor, decision trees, naive Bayes, regression, ensemble methods, support vector machines, k-means, spectral clustering, hierarchical clustering, dimensionality reduction, evaluation techniques
- Applications: recommendation systems, advertising on the Web
"Mining of Massive Datasets" by Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman; 2nd Edition (December 2014), Cambridge University Press. Available free online at <http://www.mmds.org/>.
Homework assignments and a semester-long project