Most of these are ideas at the beginning stages of development. The descriptions begin with a long introduction explaining the big picture. Don't feel intimidated by the scope of some of these projects. At the end of each description is a list of semester-sized pieces that fit within the context of the larger picture. If you would like to work on one of these projects, just send me e-mail ( rmartin@cs.rutgers.edu ) and we can set-up a meeting. Students are also welcome to help set project direction for any of these projects.
High performance UDP/IP and TCP/IP
One emerging direction recent work done here at Rutgers is an exploration of the relationship between connectivity and performance in messaging layers. Connectivity is currently a difficult measure to quantify; intuitively, it is the number of entities that a network can send data to or receive data from. In this project, you would construct a stand-alone UDP/IP implementation from scratch. The specific goal would be to get a UDP/IP packet onto the wire in less than 300 instructions. The protocol stack only has to work with one application at a time on a given host, and could even be tailored to a specific network card. In the wider context, the goal of this research would be to demonstrate that computer network designers can make a conscience choice between among connectivity and performance.
Fault tolerant, high performance messaging for clusters.
There are many high performance messaging layers for clusters. On the industrial side, the VIA architecture has emerged as a viable industry-standard messaging system for clusters However, few of the academic systems or VIA provide good error handling characteristics to applications. An interesting project would examine ways to extend these layers with better fault models. NOTE: a student has made much progress on this project! You can read about in her master's thesis here.
Building a Central Limit-Order Book Application
Good systems projects have interesting applications to drive the research. For example, the NOW project had the Inktomi search engine, as well as the world's fastest sort for 1996-97. The Porcupine project used an mail-server application. The DASH project developed the SPLASH benchmarks. A central limit-order book is another challenging application for systems researchers. This class of application has to match and execute orders between buyers and seller in a virtual marketplace. Follow the link for details.
Operating System Visualization
Current operating systems contain millions of lines of code and thousands of interacting components. The fundamental issue with this level of complexity is how to understand the complex interactions in this environment. For example, how can we understand the sequence of events that happens in a highly loaded web server? How do we identify resource bottlenecks? A visual map of the operating system can convey much more structural information than a textual representation, such as is often presented in debuggers. The idea of this project would be the automatic construction of meaningful maps of real operating systems.
I/O is a critical component of large scale computer systems. These systems run databases,web-servers and commerce servers. Current I/O systems suffer from a lack of scalability, availability and manageability (SAM). Ideally, these systems should scale to 100's of terabytes of capacity and support 1000's of simultaneous users. The I/O system must also provide 24x7 availability. In addition, the system should allow the administrator to accept reduced availability in exchange for increased capacity or performance. Finally, large scale I/O systems must be tractable to operate. This project works towards the goal of providing an I/O sub-system from the SAM perspective.
Develop a Self-Similar Workload Benchmark for File Systems
This project would observe the effects of changing the definition of "load" for the SPEC's file server benchmark, SFS97. Currently, the clients generate a poisson arrival rate to model load on the server. However, a recent paper has shown that for the timescales on the order of the length of the benchmark (minutes), file operation arrival rates are better characterized by self-similar arrival rates. How does response time and peak throughput change compared to the traditional workload?
Fault Tolerant Data Structures
Critical to the operation of any fault-tolerant system are a set of fault tolerant data structures and well-defined methods on those structures. This project would design and implement some fault-tolerant data structures in the context of a cluster architecture. Thus, are free to use some assumptions about clusters (e.g., a non-byzantine failures).