198:672 Index 22345 Section 01

Federated Computing Architectures
Prof. Richard P. Martin

Administrative | New! Project Ideas | Schedule /submission forms | Assignments | Reading List

This seminar will explore the software and hardware trade-offs of a class of machines called Federated Computing Architectures. Loosely defined, such systems as those built from a hierarchy of processors, memory, busses, and high performance switched interconnects. We will examine the benefits of these architectures from a systems perspective, focusing on both increased performance and availability.

On the availability front, we will explore how to construct fault-tolerant operating systems and applications from a collection of processors, memories and interconnects. Faults on current high-end systems are often due to software or operator error; current research suggests that fewer and fewer system failures are caused by "hard" hardware errors. A federated architecture offers the operating system and application developer the possibility of better fault containment and recovery. Current operating systems and hardware architectures are quite brittle in this regard. For example, a single stray pointer can bring a current operating systems to its knees. On the hardware side, a buggy device or bad firmware can easily lock-up the system.

On the performance side, the seminar will examine the potential to better partition and orchestrate the collection of processors. We will draw on work from distributed systems and parallel programming in order to better understand what aret he appropriate software and hardware interfaces needed in an FCA. On the applications side, we will examine previous work in cluster and parallel algorithms and applications in order to better understand the strengths and weaknesses of an FCA.

The seminar meets Tuesdays, 2:50-5:50 in the Core B seminar room (Core 305). If you can not make the first class send mail to rmartin@cs.rutgers.edu . Prerequisites include CS 519 (operating systems) or an equivalent as well as an understanding of I/O architecture. Some knowledge of the material in CS 528 (parallel computing) is also helpful. Students will be expected to provide a short write-up on the readings before each class and a project due at the end of the semester