Projects for 514

 

This page has not been updated for Spring 2006. Please disregard.

These projects are designed to potentially generate publications.  There are fewer projects than there are students, and students at Stony Brook are getting the same projects, so I expect overlaps.  The work can be done individually, or you can pair up (subject to my approval).  The best results on each project will be considered for sending to some appropriate conference or workshop.

 

Some of the projects are experimental, and some are theoretical.  The theoretical ones are (surprise!) harder.  You can also propose your own project.  If you are interested in one of the projects below, you should:

 

a.     Read the necessary papers.  Then come talk to me by February 7st.  By then you should have a partner (if you want one), a project, and some initial thoughts of what you might want to test or prove.

b.     Produce a written project proposal by February 14th.  About a page or two, containing the details of your project.  For a theoretical project: what you will try to prove; what you plan on trying to prove as a fallback; approaches.  For an experimental project: a detailed list of experiments, including what data structures will be compared; a description of how you will generate or gather the data for your trials; perhaps what deviations for the theory you will try.  In either case, a related work section.  Must be typed (preferably in latex).

c.     A progress report by the ides of March.  This should include working code and some performance graphs (if appropriate) or lemmas you have proven or counterexamples to conjectures (if appropriate).

d.     A first draft is due 2 weeks before the end of classes, and a final report is due on the penultimate day of class.  The last day of class we will also have project presentations.  Each project will have 5 minutes.

 

Projects:  Papers for the projects can be found at http://www.cs.sunysb.edu/~bender/648/papers.

 

  1. A search tree can be laid out in an array so as to minimize the number of memory transfers when the tree is searched.  In a Cache-Oblivious layout, the swapping is optimal (up to a constant) with respect to any page size.  There are a couple of papers that yield an optimal cache-oblivious solution.  This project would involve implementing and comparing various choices (how does greedy heuristic compare with dynamic programming of actual instances, etc.).  How does the cache-aware solution compare to the cache-oblivious.  What are the constants like?  Can you develop find upper and lower bounds on the number of memory transfers needed to actually layout the tree optimally? (The bounds in the papers are non-uniform: they are on the number of transfers to traverse the tree, not to actually lay out the tree.)  Is there a adaptive layout that converges on a close-to-optimal layout quickly?

 

  1. Can you do Least Common Ancestors optimally in o(n) extra space?  What about Level Ancestors?

 

  1. Can you make large-alphabet suffix trees dynamic and still fast? (You can always insert a new string with a log-alphabet overhead, but the point is to do it in linear time.)

 

  1. An implementation and comparison of various Cache Oblivious B-trees, versus each other and versus Cache Aware B-trees.  Testing on simulators versus measuring running time, number of page faults, cache misses, disk prefetching effects.  You can also implement this on the memory server of Gene Stark at Stony Brook.

 

  1. Binary search trees have some multiplicative overhead.  Getting the constant down is a good thing.  Can the packed memory structure be used for small memory search trees?  How small can you get the memory overhead to be, compared with standard search trees, and still yield competitive results?  Also, can you prove some small constant overhead if you expect random insertions?

 

  1. Knuth et alia pioneered the study of data structures where you alternate between uniformly inserting one element and uniformly deleting one element.  Search trees, e.g., have intriguing behavior in this model.  What happens to packed memory structures under this model?  Either experimentally or theoretically.

 

  1. The list labeling problem has matching upper and lower bounds of O(log n).  The closely related packed memory ds has an upper bound of O(log2 n).  Can you give a matching lower bound?  Nothing better than log is known. (Hard)

 

  1. Compare Iocono splay trees with traditional splay trees.

 

  1. Can either the simplified LCA or LA algorithms be made dynamic and still simple?

 

  1. Take any well-known complicated data structure and produce a simple version.  PQ-trees, cutting and linking trees, q-heaps, etc.