198:541 Database Systems

Wednesdays 10:20am-1:20pm, LSH-A143

Instructor: Amélie Marian
Office hours:TBA, CoRE 324

TA: TBA
Office hours: TBA


Announcements

1/12: Class Sakai worksite is set up. Announcements, lecture notes and homework will be posted through Sakai. If you are registered for the class and cannot access the class Sakai website, please contact the instructor immediately


Course Description

This course focuses on advanced topics in Database Management Systems and Web Data. We will discuss recent advances in data management through readings of research papers. Students will also work on a semester-long class project related to advanced data management topics such as data integratio, data mining, search, query processing.


Readings

Reading in Database Systems (RedBook), 5rd edition, Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker, editors. Can be fount at www.redbook.io
Individual papers will be made available on Sakai.

Plus additional readings from recent database research papers.


Grading

35% Project
30% Database Conference (includes presentation and participation to discussions)
10% Quizzes
25% Midterm Exam


(Tentative) Schedule

Date

Lectures and Readings

January 18

Traditional RDBMS Systems
and DB Techniques Everyone Should know

Readings :
RedBook Chapters 1-2-3


January 25

Boolean Information Retrieval.
Information Retrieval: Scoring and Ranking.
Web Search.

Readings:
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2007.
(Chapter 1 "Information Retrieval Using the Boolean Model"
Chapter 6 "Scoring & term weighting",
Chapter 7 "Vector space retrieval",
Chapter 8 "Evaluation in information retrieval",
Chapter 21 "Link Analysis").

Justin Zobel, Alistair Moffat, and Kotagiri Ramamohanarao: Inverted Files Versus Signature Files for Text Indexing, in ACM Transactions on Database Systems, Vol. 23, No. 4 (1998), Pages 453-490.

The Anatomy of a Large-Scale Hypertextual Web Search Engine, Brin, Page, 1998

February 1

Top-k Queries

Readings (available on Sakai):
Optimal aggregation algorithms for middleware. Fagin, Lotem and Naor, PODS 2001.
Evaluating top-K queries over web-accessible databases. Marian, Bruno, Gravano, ICDE 2002.
Shooting Stars in the Sky: An Online Algorithm for Skyline Queries, Kossmann, Ramsak, Rost, VLDB, 2002. A survey of top-k query processing techniques in relational database systems IF Ilyas, G Beskales, MA Soliman ACM Computing Surveys (CSUR) 40 (4), 11, 2008

February 8 and 15

Web Data and Data Integration

Readings (available on Sakai):
RedBook Chapters 10 and 12
Data Integration: The Teenage Years, Alon Halevy, Anand Rajaraman and Joann Ordille, VLDB 2006.
WebTables: Exploring the Power of Tables on the Web.Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, Yang Zhang. VLDB, 2008.

February 22

Data Analytics. Data Warehousing and OLAP. Data Mining.

Readings (available on Sakai):

Redbook Chapter 8
Implementing Data Cubes Efficiently. Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman. SIGMOD 1996
Fast Algorithms for Mining Association Rules in Large Databases, Agrawal and Srikant, VLDB 1994

March 1

New DBMS Architectures.

Readings (available on Sakai):

Redbook Chapter 4
C-store: A Column-oriented DBMS. Stonebraker et al. SIGMOD, 2005.
Bigtable: A Distributed Storage System for Structured Data. Chang et al. OSDI, 2006.

March 8
Midterm Exam (in class)
March 15 Spring break
March 21
Recommender Systems
Recent Database conference papers (student presentations)
March 29
Recent Database conference papers (student presentations)
April 5
Recent Database conference papers (student presentations)
April 12
Recent Database conference papers (student presentations)
April 19
Responsible use of Data - Data Ethics
April 26
Project Presentations