Current Operating Systems contain millions of lines of code and thousands of interacting components. The fundamental issue with this level of complexity is how to understand the complex interactions in this environment. For example, how can we understand the sequence of events that happens in a highly loaded web server? How do we identify resource bottlenecks? For any given task, most of the operating system probably remains unused. How can we identify the components worth speeding up and ignore the rest? On a more global level, how can we identify unforseen scheduling interactions? How can we isolate performance bugs?
Humans are very poor at understanding large volumes of information in text format, but in a visual format much more information can be presented in meaningful manner. For example, imagine trying to understand the complex interactions of the weather from a text description. While text can convey simple information quickly, e.g. "sunny skies in the northeast", text has limitation on describing complex structures such as frontal waves in thunderstorms, or the interlocking spiral bands in hurricanes.
|
|
|---|
|
Figure 1. This figure plots events as they unfold in the SPINE IP router. Time is shown on the x-axis and event type on the y-axis. A box is plotted during the time of each event occurrence. The width of the box corresponds to the length of the event. The dashed rectangles correspond to higher level packet semantics: receiving a packet, routing it, and forwarding it over the I/O bus. The arrows show causal relationships of a single packet as it moves though the system. |
On the other hand, a visual map easily conveys this class of information. In like manner, a visual map of the operating system can convey much more structural information than a textual representation, such as is often presented in debuggers. One method of mapping the operating system is an event plot. The event plot is a two-dimensional map with time on the x-axis and event type on the y-axis. Figure 1 shows the basic structure of an event plot. The event plot is from the code running inside a network interface.
A danger of event plots is that it is difficult to discern the structure inherent structure of the system from the plot alone. What can happen is that the event plot remains and inscrutable jumble. The key is to cluster related events together on the event axis. Figure 1 shows some higher-level structures. Higher level semantics are outlined by boxes and causal relationships are shown by arrows. However, these we carefully layed out by hand; in an system with 1000's of events, such hand-crafted layouts would probably take too long. Part of a successful project would be to automatically find related event clusters and relationships.
Plot an event sequence in a real operating system. You would have to design the infrastructure needed to log events, make sure that the event processing overhead was not too high, and then create a meaningful event plot. Just generating some plots for an interesting application, such as a web-server, would be enough for a class project. An ambitious project might try some tasks outlined below.
In a busy system, the same tasks get repeated over and over. For example, a busy web-server will serve the same pages many times in quick succession. This repetition allows for the automatic discovery of event loops. For example, Figure 1 shows a few repeating patterns in the event structure. For example, the polling sequence for the input queues is repeating many times and thus is a good candidate to optimize. One of the real values of an event map is the visualization of these higher-level loops. Once you have the basic event gathering infrastructure, can you build a system to find event loops?
Scheduling is a primary task of any operating system. A good event map could help the OS designer in the creation of better scheduling policies and decisions. Instead of the just observing an event log, perhaps your system could allow the OS designer the opportunity to experiment with fitting the various events together for increased performance.
Jennifer M. Anderson, Lance M. Berc, Jeffrey Dean, Sanjay Ghemawat, Monika R. Henzinger, Shun-Tak A. Leung, Richard L. Sites, Mark T. Vandevoorde, Carl A. Waldspurger, and William E. Weihl. Continuous Profiling: Where Have All the Cycles Gone?. In SOSP 16
Remzi Arpaci and Manuel Fähndrich. revEELing Solaris