This course covers topics needed to solve problems involving data, which includes preparation (collection and integration), characterization and presentation (information visualization), analysis (machine learning and data mining), and products (applications).
Please note that courses for which a student has received a grade of D cannot be used to satisfy prerequisite requirements.
- Data visualization
- Data wrangling and pre-processing
- Map-reduce and the new software stack
- Data mining: finding similar items, mining data streams, frequent itemsets, link analysis, mining graph data
- Machine learning: k nearest neighbor, decision trees, naive Bayes, regression, ensemble methods, support vector machines, k-means, spectral clustering, hierarchical clustering, dimensionality reduction, evaluation techniques
- Applications: recommendation systems, advertising on the Web
"Mining of Massive Datasets" by Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman; 2nd Edition (December 2014), Cambridge University Press. Available free online at <http://www.mmds.org/>.
Homework assignments and a semester-long project
- will be prepared to contribute to a rapidly changing field by acquiring a thorough grounding in the core principles and foundations of computer science (e.g., techniques of program design, creation, and testing; key aspects of computer hardware; algorithmic principles).
- will acquire a deeper understanding on (elective) topics of more specialized interest, and be able to critically review, assess, and communicate current developments in the field.
- will be prepared for the next step in their careers, for example, by having done a research project (for those headed to graduate school), a programming project (for those going into the software industry), or some sort of business plan (for those going into startups).