generation has considered one or the other data set they had as being
"massive". Today, we (are positing to) have truly massive datasets.
This is primarily because
we now collect data via automatic datafeeds from the internet, from satellites,
from the financial industry,
using sensors, etc. which challenges the transmitting (T), computing
(C) and storage (S) capacity of our systems.
Dealing with this challenge to TCS calls for fundamental ideas and
techniques from many areas, and a
new mindset to embed them all into novel systems. This course will explore
all aspects of meeting this
challenge. We are looking for students from any (or all!) of
the following areas:
analysis is feasible and useful on massive data?
How to design
new algorithms that work within the resource constraints?
a theory of what can be processed on massive streams?
the theory of random variables and sampling that underpin many of the algorithms.
Harmonic Analysis, Mathematical
Fundamental questions in these areas arise.
extend the current database technology to deal with massive streams?
networking issues can be resolved using massive data analysis on traffic
to build a massive information gathering infrastructure using sensors?
build specialized hardware to process massive data streams?
What special programming language support is needed?
We meet once a week
for 2 hours. Tentatively it will be on Thursdays, 6---8PM. First few
weeks will be lectures on the basics. In second half of the course, you
will present research papers or projects or
perspectives from each of the different fields, drawing from whatever
is your area of expertise or interest. There is
scope to explore only programming projects, only mathematical papers,
Lecture 1: pdf.
Lecture 2: pdf.
Lecture 3: pdf.
Lecture 4: pdf.
Lecture 5: pdf.
for generating values from stable distributions (Lecture 4)