Projects for
514
This page has not been updated for Spring 2006. Please disregard.
These projects are designed to potentially generate
publications. There are
fewer
projects than there are students, and students at Stony Brook are
getting the
same projects, so I expect overlaps.
The work can be done individually, or you can pair up (subject
to my
approval). The best
results on
each project will be considered for sending to some appropriate
conference or
workshop.
Some of the projects are experimental, and some are
theoretical. The
theoretical ones
are (surprise!) harder.
You can
also propose your own project.
If
you are interested in one of the projects below, you should:
a.
Read the necessary papers. Then come
talk to me
by February 7st.
By
then you should have a partner (if you want one), a project, and some
initial
thoughts of what you might want to test or prove.
b.
Produce a written project proposal by February
14th. About a page or two,
containing the
details of your project.
For a
theoretical project: what you will try to prove;
what you plan on trying to prove
as a fallback; approaches.
For an
experimental project: a detailed list of experiments, including what
data
structures will be compared; a description of how you will generate or
gather
the data for your trials; perhaps what deviations for the theory you
will
try. In either case, a
related
work section. Must be
typed
(preferably in latex).
c.
A progress report by the ides of March. This should include working
code and
some performance graphs (if appropriate) or lemmas you have proven or
counterexamples to conjectures (if appropriate).
d.
A first draft is due 2 weeks before the end of
classes, and a
final report is due on the penultimate day of class. The last day of class we will also have project
presentations. Each
project will
have 5 minutes.
Projects:
Papers
for the projects can be found at http://www.cs.sunysb.edu/~bender/648/papers.
- A
search tree can be laid out in an array so as to minimize the
number of
memory transfers when the tree is searched. In a Cache-Oblivious layout, the swapping is optimal (up to a constant) with
respect
to any page size.
There are a
couple of papers that yield an optimal cache-oblivious
solution. This project would
involve
implementing and comparing various choices (how does greedy
heuristic
compare with dynamic programming of actual instances, etc.). How does the cache-aware
solution
compare to the cache-oblivious.
What are the constants like?
Can you develop find upper and lower bounds on the number
of memory
transfers needed to actually layout the tree optimally? (The
bounds in the
papers are non-uniform: they are on the number of transfers to
traverse
the tree, not to actually lay out the tree.) Is there a adaptive layout that converges on a
close-to-optimal
layout quickly?
- Can
you do Least Common Ancestors optimally in o(n) extra space? What about Level
Ancestors?
- Can
you make large-alphabet suffix trees dynamic and still fast? (You
can
always insert a new string with a log-alphabet overhead, but the
point is
to do it in linear time.)
- An
implementation and comparison of various Cache Oblivious B-trees,
versus
each other and versus Cache Aware B-trees. Testing on simulators versus measuring running
time,
number of page faults, cache misses, disk prefetching
effects. You can also implement
this on the
memory server of Gene Stark at Stony Brook.
- Binary
search trees have some multiplicative overhead. Getting the constant down is a good
thing.
Can the packed memory structure be used for small memory
search
trees? How small can
you get
the memory overhead to be, compared with standard search trees,
and still
yield competitive results?
Also, can you prove some small constant overhead if you
expect
random insertions?
- Knuth
et alia pioneered the study of data structures where you
alternate between
uniformly inserting one element and uniformly deleting one
element. Search trees, e.g., have
intriguing behavior in this model.
What happens to packed memory structures under this
model? Either experimentally or
theoretically.
- The
list labeling problem has matching upper and lower bounds of
O(log
n). The closely
related
packed memory ds has an upper bound of O(log2 n). Can you give a matching
lower
bound? Nothing
better than
log is known. (Hard)
- Compare
Iocono splay trees with traditional splay
trees.
- Can
either the simplified LCA or LA algorithms be made dynamic and
still
simple?
- Take
any well-known complicated data structure and produce a simple
version. PQ-trees,
cutting
and linking trees, q-heaps, etc.