198:336
Section 3 Spring
2005
Information and Database Management
LECTURE
NOTES [Readings]
(Note: not all lectures are guaranteed to be available on the web.)
-
Introduction to Information Management. Jan.18
-
Logic-based Information Managers Jan. 20,
25,27
-
[Sections 24.1 to 24.4 of Elmasri & Navathe,
"Foundations of Database Management", second edition. On permanent
reserve in SERC Library now](The treatment of Datalog in
your textbook unfortunately is too closely tied to material we have not
yet covered.)
-
Sections on Datalog from Silberschatz et al (quite
short), and Elmasri&Navathe, on reserve.
-
"A Prolog Primer" by Jean Rogers, up to
Chapter 5 (Lists), also on reserve
-
Conceptual Modeling and UML: Feb. 1, 3, 8
-
Relational Databases and SQL
-
ER diagrams and Relational Design
-
Constraints, Views and Triggers in Relational DBMS
-
Constraints - March 23
-
Views. Triggers. - March 25
(Sorry,
but you should try to print these from a Macintosh or using Adobe 6.0.0.2
or later)
-
Read: Chapter 5.8, 5.9, plus a few pages on reserve,
from the work of Ceri and Widom
-
Application Programming with Databases. March 29
-
XML : March 31, April 5, April 7.
-
Text Information Retrieval April 14, 19
Copyright policy concerning material on
this web site: Please remember that it takes effort for people to produce
this "intellectual capital". I can do no better than quote my esteemed
colleague, Professor Vasek Chvátal: "Whoever links to these notes
will show exquisite taste. Whoever copies these notes (without permission)
will be prosecuted to the hilt." Of course, I try to acknowledge those
on whose work this course is built, usually with their permission: Their
(growing) list includes, in alphabetical order, Andrew
McCallum, Enrico Franconi, Alon Halevy, Rao Kambhampati, Weiying Meng,
Raghu Ramakrishan, Dan Suciu, Jeffrey Ullman.
General
Course Information and Requirements
-
Instructor: Professor Alex
Borgida (borgida at cs rutgers
edu)
Office hours: Thursday 3-5pm; Core 315 (phone 732 445 4744)
-
Teaching Assistant: Weijun
He
-
Co-ordinates: lecture: Tuesday & Thursday
7:40-9:00 pm, ARC-107
recitation: Tuesday 6:35-7:30pm in ARC 107
-
Textbook
-
Database Management Systems,
3rd
edition !!!,McGraw-Hill, 2003 (Surprisingly,
it also covers some XML, Information Retrieval & Data Mining)
-
BUT: considerable material will be covered
which is not in books. Students are responsible for
knowing all (and only) the material covered in lectures, and class attendance
will be taken.
-
Expected work and grading:
-
Term work (40%):
-
homework assignments (include small programming excercises)
-
course project (teams of 2), involves designing a complete application
that uses a DBMS (Oracle/MySQL) to power a web interface (in XML/html)
-
Tests (60%):
-
midterm test: March 8th, in lecture
-
final exam:
May
10th, 8-11pm, in lecture room
-
Policy on grading:
-
In giving final grades (esp.low ones, but also at boundries) I consider
each case carefully on its merits, including revisiting the final exam,
the programming projects, their grading, and what knowledge they indicate.
Some other things that make me positively inclined are attendance in class
and recitation. (Some things that make me negatively inclined are failing
to submit programs and assignments, or copying on these.)
-
If you have problems during the course, I strongly encourage you
to come talk to me. (In fact, I welcome the opportunity to talk to all
students outside the class.) To be fair to other students, current and
past, unfortunately we have to be entirely consistent, and stick to a longstanding
policy we've evolved of denying requests to change final
grades, even by doing additional work. The only exceptions to this
are based on letters from the Dean. (I regret having to start
with such a blunt message, but past experience ...)
-
Policy on Academic Integrity: please read the DCS
policy
statement on academic integrity to which this course will adhere. Please
don't jeopardize your academic career by copying, or collaborating beyond
permissible limits.
OVERVIEW
-
Course principles
The good/bad news: as the old car commercial says, "This is not
your father's database course!"
The course this term will cover aspects of using various kinds of information
in the age of the internet, including structured data from databases, semi-structured
data like XML and hyperlinked web pages, unstructured text, and even knowledge
from which you can infer things. In general, the course will consider various
languages
for representing and accessing the different kinds of information, methodologies
for using them, theoretical principles underlying them, and some
fundamental
algorithms. [However, in contrast to standard database courses, we
will be very skimpy on implementation aspects of DBMS, such as data storage,
and transaction processing. In particular, it is best to think of this
course as one intended to learn how to use things like DBMS, not
how they are implemented. In other words, as the analogue of the first
part of a programming languages/compilers sequence.]
-
Prerequisites
198:205 Discrete Structures I, for logic.
198:112 Data Structurers, for data access techniques (hashing, search
trees)
-
Topics The following is a tentative list of topics
to be covered, though the order is not necessarily fixed. (Optional topics
marked with *.)
-
A functional view of an Information Service, illustrated
with logic
-
The I.S. as an abstract datatype with Ask and Tell
operations: a
unifying framework for the course
-
Datalog Information Service, using Prolog as a query
evaluation engine
-
Review of Logic, and its use as specification of I.S.
services
-
Conceptual models: organizing
information in ways natural to humans
-
Semantic Networks as Knowledge Representation (and what is
wrong with them)
-
Extended Entity Relationship model
-
UML Class model
-
*Ontology languages/foundations for the semantic web: Description
Logics
-
Methodology for Conceptual Modeling
-
Structured Data: Relational
Databases
-
Database Management System services
-
SQL query language, (and relational algebra)
-
Advanced features of SQL: views, triggers, integrity constraints
-
Accessing DBMS from the Web
-
Methodology of Relational Schema Design
-
From EER Conceptual Models
-
*Using "functional dependency theory"
-
Unstructured Data: Text ("Information
Retrieval")
-
Boolean retrieval
-
Vector-space model of Information Retrieval, including TF-IDF
-
Evaluation of IR systems
-
Semi-structured Data: XML &
the Web
-
XML: data as labelled trees ("the universal communication
language")
-
Imposing structure: DTDs
-
Querying: XPath and XQuery
-
Storing XML in databases
-
The WEB
-
*Architecture of a search engine (Google)
-
Ordering pages retrieved using link and anchor
information in web pages: Hubs&Authorities, PageRank(Google)
-
Deductive Knowledge: inferring
new information
-
Deduction in Propositional, First Order Logic, Description
Logics (see above)
-
?Expert system rules?
-
Information Integration: the
biggest problem today
-
Data warehouse/mediator architectures
-
*Schema Integration
-
Implementation aspects of DBMS seen
by the user (as time permits)
-
indexing and basic query optimization
-
transactions and concurrency control in DBMS
If you have further
questions, please feel free to email me: borgida at cs
|
|