Papers and Abstracts
( PDF 2.1 MB )
Abstract: We explore the problem of detecting whether a device has moved within a room. Our approach relies on comparing summaries of received signal strength measurements over time, which we call descriptors. We consider descriptors based on the differences in the mean, standard deviation, and histogram comparison. In close to 1000 mobility events we conducted, our approach delivers perfect recall and near perfect precision for detecting mobility at a granularity of a few seconds. It is robust to the movement of dummy objects near the transmitter as well as people moving within the room. The detection is successful because true mobility causes fast fading, while environmental mobility causes shadow fading, which exhibit considerable difference in signal distributions. The ability to produce good detection accuracy throughout the experiments also demonstrates that our approach can be applied to varying room environments and radio technologies, thus enabling novel security, health care, and inventory control applications.
( PDF 1.5 MB )
Abstract: This paper describes tracking experiments we ran in medical settings in order to study the utility of RTLS systems in modeling hospitals workflows in real time. It reports some lessons we learned regarding the density of the tracking landmark deployment as well as the power and mounting of the tracking tags and their effect on the tracking accuracy. It then discusses how we deduced the clinical activities along the workflow from the RTLS tracking data.
( PDF 240 Kb )
Abstract: We show that signal strength variability can be reduced by employing multiple low-cost antennas at fixed locations. We further explore the impact of this reduction on wireless localization by analyzing a representative set of algorithms ranging from fingerprint matching, to statistical maximum likelihood estimation, and to multilateration. We provide experimental evaluation using an indoor wireless testbed of the localization performance under multiple antennas. We found that in nearly all cases the performance of localization algorithms improved when using multiple antennas. Specifically, the median and the 90th percentile error can be reduced up to 70\%. Additionally, we found that multiple antennas improve the localization stability significantly, up to 100\% improvement, when there are small scale 3-dimensional movements of a mobile device around a given location.
( PDF 228 Kb )
Abstract: We present two parallel algorithms and their Unified Parallel C implementations for Bayesian indoor positioning systems. Our approaches are founded on Markov Chain Monte Carlo simulations. We evaluated two basic partitioning schemes: inter-chain partitioning which distributes entire Markov chains to different processors, and intra-chain which distributes a single chain across processors. Evaluations on a 16-node symmetric multiprocessor, a 4-node cluster comprising of quad processors, and a 16 single-processor-node cluster, suggest that for short chains intra-chain scales well on the first two platforms with speedups of up to 12. On the other hand, inter-chain gives speedups of 12 only for very long chains, sometimes of up to 60,000 iterations, on all three platforms. We used the LogGP model to analyze our algorithms and predict their performance. Model predictions for inter-chain are within 5% of the actual execution time, while for intra-chain they are 7%-25% less due to load imbalance not captured in the model.
( PDF 1.15 Mb )
Abstract: We analyze the feasibility of boostrapping a location service through geocoded postal addresses rather than the common wardriving technique. A location service that contains the MAC addresses and geographic position of wireless LAN access points enables positioning services for WLAN devices and location-aware networking protocols. This work thus compares the accuracy of access point position estimates obtained based on RF signal strengths readings (wardriving) with the accuracy of the geocoded postal address. The results show similar accuracy for geocoding in comparison to typical wardriving studies, with significantly reduced effort if postal addresses of access point positions are known.
( PDF 258 Kb )
Abstract: Accurately positioning nodes in wireless and sensor networks is important because the location of sensors is a critical input to many higher-level networking tasks. However, the localization infrastructure can be subjected to non-cryptographic attacks, such as signal attenuation and amplification, that cannot be addressed by traditional security services. We propose several attack detection schemes for wireless localization systems. We first formulate a theoretical foundation for the attack detection problem using statistical significance testing. Next, we define test metrics for two broad localization approaches: multilateration and signal strength. We then derived both mathematical models and analytic solutions for attack detection for any system that utilizes those approaches. We also studied additional test statistics that are specific to a diverse set of algorithms. Our trace-driven experimental results provide strong evidence of the effectiveness of our attack detection schemes with high detection rates and low false positive rates across both an 802.11 (WiFi) network as well as an 802.15.4 (ZigBee) network in two real office buildings. Surprisingly, we found that of the several methods we describe, all provide qualitatively similar detection rates which indicate that the different localization systems all contain similar attack detection capability.
( PDF 3.2 Mb )
Abstract: Wireless networks are vulnerable to spoofing attacks, which allows for many other forms of attacks on the networks. Although the identity of a node can be verified through cryptographic authentication, authentication is not always possible because it requires key management and additional infrastructural overhead. In this paper we propose a method for both detecting spoofing attacks, as well as locating the positions of adversaries performing the attacks. We first propose an attack detector for wireless spoofing that utilizes K-means cluster analysis. Next, we describe how we integrated our attack detector into a real-time indoor localization system, which is also capable of localizing the positions of the attackers. We then show that the positions of the attackers can be localized using either area-based or point-based localization algorithms with the same relative errors as in the normal case. We have evaluated our methods through experimentation using both an 802.11 (WiFi) network as well as an 802.15.4 (ZigBee) network. Our results show that it is possible to detect wireless spoofing with both a high detection rate and a low false positive rate, thereby providing strong evidence of the effectiveness of the K-means spoofing detector as well as the attack localizer.
( PDF 171 Kb )
Abstract: In this paper we describe a radio-based localization approach that is based on the use of rotating directional antennas and a Bayesian network that combines both angle-of-arrival (AoA) and Received Signal Strength (RSS). After describing our network, we extensively characterize the accuracy of our approach under a variety of measured signal distortion types. Next, using a combination of synthetic and trace-driven experiments, we show the impact of different signal distortions on localization performance. We found the use of directional antennas was effective at averaging out multi-path effects in indoor environments, which helped reduce the amount of training data required compared to previous approaches.
( PDF 85 Kb )
Abstract: Operator mistakes have been identified as a significant source of unavailability in Internet services. In this paper, we propose a new language, A, for service engineers to write assertions about expected behaviors, proper configurations, and proper structural characteristics. This formalized specification of correct behavior can be used to bolster system understanding, as well as help to flag operator mistakes in a distributed system. Operator mistakes can be caused by anything from static misconfiguration to physical placement of wires and machines. This language, along with its associated runtime system, seeks to be flexible and robust enough to deal with the wide array of operator mistakes while maintaining a simple interface for designers or programmers.
( PDF 253 Kb )
Abstract: We investigate the impact of landmark placement on localization performance using a combination of analytic and experimental analysis. For our analysis, we have derived an upper bound for the localization error of the linear least squares algorithm. This bound reflects the placement of landmarks as well as measurement errors at the landmarks. We next develop a novel algorithm, maxL-min, that using our analysis, finds a pattern for landmark placement that minimizes the maximum localization error. To show our results are applicable to a variety of localization algorithms, we then conducted a series of localization experiments using both an 802.11 (WiFi) network as well as an 802.15.4 (ZigBee) network in a real building environment. We use both Received Signal Strength (RSS) and Time-of-Arrival (ToA) as ranging modalities. Our experimental results show that our landmark placement algorithm is generic because the resulting placements improve localization performance across a diverse set of algorithms, networks, and ranging modalities.
( PDF 715 Kb )
Abstract: In this work we show how to reduce the computational cost of using Bayesian networks for localization. We investigate a range of Monte Carlo sampling strategies, including Gibbs and Metropolis. We found that for our Gibbs samplers, most of the time is spent in slice sampling. Moreover, our results show that although uniform sampling over the entire domain suffers occasional rejections, it has a much lower overall computational cost than approaches that carefully avoid rejections. The key reason for this efficiency is the flatness of the full conditionals in our localization networks.Our sampling technique is also attractive because it does not require extensive tuning to achieve good performance, unlike the Metropolis samplers. We demonstrate that our whole domain sampling technique converges accurately with low latency. On commodity hardware our sampler localizes up to 10 points in less than half a second, which is over 10 times faster than a common general-purpose Bayesian sampler. Our sampler also scales well, localizing 51 objects with no location information in the training set in less than 6 seconds. Finally, we present an analytic model that describes the number of evaluations per variable using slice sampling. The model allows us to analytically determine how flat a distribution should be so that whole domain sampling is computationally more efficient when compared to other methods.
Yingying Chen, Konstantinos Kleisouris, Xiaoyan Li, Wade Trappe, Richard P. Martin, The Robustness of Localization Algorithms to Signal Strength Attacks: A Comparative Study, in the 2006 International Conference on Distributed Computing in Sensor Systems (DCOSS 2006), San Francisco, CA, June 2006.
( PDF 281 Kb )
Abstract: In this paper, we examine several localization algorithms and evaluate their robustness to attacks where an adversary attenuates or amplifies the signal strength at one or more landmarks. We propose several performance metrics that quantify the estimators precision and error, including Holder metrics, which quantify the variability in position space for a given variability in signal strength space. We then conduct a trace-driven evaluation of several point-based and area- based algorithms, where we measured their performance as we applied attacks on real data from two different buildings. We found the median error degraded gracefully, with a linear response as a function of the attack strength. We also found that area-based algorithms experienced a decrease and a spatial-shift in the returned area under attack, implying that precision increases though bias is introduced for these schemes. We observed both strong experimental and theoretic evidence that all the algorithms have similar average responses to signal strength attacks.
( PDF 221 Kb )
Abstract: A large number of enterprises need their commodity database systems to remain available at all times. Although administrator mistakes are a significant source of unavailability and cost in these systems, no study to date has sought to quantify the frequency of mistakes in the field, understand the context in which they occur, or develop system support to deal with them explicitly. In this paper, we first characterize the typical administrator tasks, testing environments, and mistakes using results from an extensive survey we have conducted of 51 experienced administrators. Given the results of this survey, we next propose system support to validate administrator actions before they are made visible to users. Our prototype implementation creates a validation environment that is an extension of a replicated database system, where administrator actions can be validated using real workloads. The prototype implements three forms of validation, including a novel form in which the behavior of a database replica can be validated even without an example of correct behavior for comparison. Our results show that the prototype can detect the major classes of administrator mistakes.
( PDF 574 Kb )
Abstract: In this paper we present a simple ray-sector model of signal strength for indoor 802.11 networks. Signal strength is an important parameter for a variety of important wireless networking tasks, such as localization and topology control. A sufficiently accurate, yet generic, method of generating signal strength maps is needed in order to accurately simulate, design and evaluate these systems. Our ray-sector model constructs signal maps by adding signal bias with sectors defined by rings and randomized rays, using a traditional log-linear decay model as a baseline. We show our model generates distortions similar to measured radio maps by quantitatively comparing the behavior of micro-benchmarks using maps from two buildings and those generated by our model. Finally, we demonstrate the utility of our model for higher-level applications by showing it accurately predicts the performance for two dissimilar localization algorithms.
( PDF 287 Kb )
Abstract: In this paper, we introduce a new approach to location estimation where, instead of locating a single client, we simultaneously locate a set of wireless clients. We present a Bayesian hierarchical model for indoor location estimation in wireless networks. We demonstrate that our model achieves accuracy that is similar to other published models and algorithms. By harnessing prior knowledge, our model eliminates the requirement for training data as compared with existing approaches, thereby introducing the notion of a fully adaptive zero profiling approach to location estimation.
( PDF 237 Kb )
Abstract: Operator mistakes are a significant source of unavailability in modern Internet services. In this paper, we first characterize these mistakes by performing an extensive set of experiments using human operators and a realistic three-tier auction service. The mistakes we observed range from software misconfiguration, to fault misdiagnosis, to incorrect software restarts. We next propose to validate operator actions before they are made visible to the rest of the system. We demonstrate how to accomplish this task via the creation of a validation environment that is an extension of the online system, where components can be validated using real workloads before they are migrated into the running service. We show that our prototype validation system can detect 66% of the operator mistakes that we have observed.
Eiman Elnahrawy, Xiaoyan Li, Richard P. Martin, Using Area-based Presentations and Metrics for Localization Systems in Wireless LANs, 4th IEEE Workshop on Wireless Local Networks, Tampa, FL, November 2004.
( PDF 167 Kb )
Abstract: In this work we show the utility of WLAN localization using areas and volumes as the fundamental localization unit. We demonstrate that area-based algorithms have a critical advantage over point-based approaches because they are better able to describe localization uncertainty, which is a common theme across WLAN based localization systems. Next, we present two novel area-based algorithms. To evaluate area-based approaches, we introduce several new localization performance metrics. We then evaluate the two algorithms using our metrics with data collected from our local WLAN. Finally, we compare our area-based algorithms against traditional point-based approaches. We found that using areas improves the ability of the localization system to give users meaningful alternatives in the face of position uncertainty.
Andrew Tjang, Michael Pagliorola, Hiral Patel, Xiaoyan Li, Richard P. Martin, Active Tapes: Bus-Based Sensor Networks, Rutgers Department of Computer Science Technical Report DCS-TR-557, July, 2004 (A short version appeared in the EmNETs-I workshop)
( PDF 202Kb )
Abstract: In this work we present Active Tapes, a bus organization for sensor networks. We construct simple cost models to compare active tapes against more traditional wireless sensor networks. Our models show that density, lifetime, and power consumption play significant roles in determining overall deployment and maintenance cost. We then characterize regimes where tapes are always cost effective, sometimes cost effective, and never cost effective. We then describe three real-world applications using our models. Next, we present an initial implementation of an active tape for power distribution and networking. Finally, we conclude with a discussion of future directions for the active tape architecture.
Eiman Elnahrawy, Xiaoyan Li, Richard P. Martin, The Limits of Localization Using Signal Strength: A Comparative Study In Proceedings of the IEEE Conference on Sensor and Ad Hoc Communication Networks (SECON) , Santa Clara, CA, October, 2004.
( PDF 206 Kb )
Abstract: We characterize the fundamental limits of localization using signal strength in indoor environments. Signal strength approaches are attractive because they are widely applicable to wireless sensor networks and do not require additional localization hardware. We show that although a broad spectrum of algorithms can trade accuracy for precision, none has a significant advantage in localization performance. We found that using commodity 802.11 technology over a range of algorithms, approaches and environments, one can expect a median localization error of 10ft and 97th percentile of 30ft. We present strong evidence that these limitations are fundamental and that they are unlikely to be transcended without fundamentally more complex environmental models or additional localization infrastructure.
Xiaoyan Li, Thu D. Nguyen,, Richard P. Martin, Using Adaptive Range Control to Maximize 1 Hop Broadcast Coverage in Dense Wireless Networks, In Proceedings of the IEEE Conference on Sensor and Ad Hoc Communication Networks (SECON) , Santa Clara, CA, October, 2004.
( PDF ) 188 Kb
(Technical Report DCS-TR-529 Version PDF 310 Kb)
Abstract: We present a distributed algorithm for maximizing 1-hop broadcast coverage in dense wireless sensor networks. Our strategy is built upon an analytic model that predicts the optimal range for maximizing 1-hop broadcast coverage given information like network density and node sending rate. The algorithm allows each node to set the maximizing radio range using only the locally observed sending rate and node density. The algorithm is thus critically dependent on the empirical determination of these parameters. Our algorithm can observe the parameters using only message eavesdropping and thus does not require extra protocol messages. Using simulation, we show that in spite of many simplifications in the model and incomplete density information in a live network, our algorithm converges fairly quickly and provides good coverage for both uniform and non-uniform networks across a wide range of conditions. We also demonstrate the utility of our algorithm for higher layer protocols by showing that it significantly improves the reception rate for a flooding application as well as the performance of a localization protocol.
Xiaoyan Li, Thu D. Nguyen,, Richard P. Martin, An Analytic Model Predicting the Optimal Range for Maximizing 1-Hop Broadcast Coverage in Dense Wireless Networks, In Proceedings of the 3rd International Conference on AD-HOC Networks & Wireless (AdHoc NOW), Vancouver, CA, July 2004.
( PDF ) 118 Kb
Abstract: We present an analytic model to predict the optimal range for maximizing 1-hop broadcast coverage in dense ad-hoc wireless networks, using information like network density and node sending rate. We first derive a geometric-based, probabilistic model that describes the expected coverage as a function of range, sending rate and density. Because we can only solve the resulting equations numerically, we next develop extrapolations that find the optimum range for any rate and density given a single precomputed optimum. Finally using simulation, we show that in spite of many simplifications in the model, our extrapolation is able to predict the optimal range to within 16%.
Kiran Nagaraja, Neeraj Krishnan, Ricardo Bianchini, Richard P. Martin, Thu D. Nguyen, Quantifying and Improving the Availability of High-Performance Cluster-Based Internet Services, In Proceedings of the SC Conference 2003, Phoenix, AZ, Nov., 2003.
Abstract: Cluster-based servers can substantially increase performance when nodes cooperate to globally manage resources. However, in this paper we show that cooperation results in a substantial availability loss, in the absence of high-availability mechanisms. Specifically, we show that a sophisticated cluster-based Web server, which gains a factor of 3 in performance through cooperation, increases service unavailability by a factor of 10 over a non-cooperative version. We then show how to augment this Web server with software components embodying a small set of high-availability techniques to regain the lost availability. Among other interesting observations, we show that the application of multiple high-availability techniques, each implemented independently in its own subsystem, can lead to inconsistent recovery actions. We also show that a novel technique called Fault Model Enforcement can be used to resolve such inconsistencies. Augmenting the server with these techniques led to a final expected availability of close to 99.99%.
Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen, Autonomous Replication for High Availability in Unstructured P2P Systems, In Proceedings of the IEEE Symposium on Reliable Distributed Systems (SRDS), Florence, Italy, Oct., 2003.
Abstract: We consider the problem of increasing the availability of shared data in peer-to-peer systems. In particular, we conservatively estimate the amount of excess storage required to achieve a practical availability of 99.9% by studying a decentralized algorithm that only depends on a modest amount of loosely synchronized global state. Our algorithm uses randomized decisions extensively together with a novel application of an erasure code to tolerate autonomous peer actions as well as staleness in the loosely synchronized global state. We study the behavior of this algorithm in three distinct environments modeled on previously reported measurements. We show that while peers act autonomously, the community as a whole will reach a stable configuration. We also show that space is used fairly and efficiently, delivering three nines availability at a cost of six times the storage footprint of the data collection when the average peer availability is only 24%.
Francisco Matias Cuenca-Acuna, Christopher Peery, Richard P. Martin, Thu D. Nguyen, PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities, In proc. of the IEEE International Symposium on High Performance Distributed Computing (HPCD 12), Seattle, WA, June 2003
(PDF, 190 Kb)
Abstract: We introduce PlanetP, a content addressable publish/subscribe service for unstructured peer-to-peer (P2P) communities. PlanetP supports content addressing by providing: (1) a gossiping layer used to globally replicate a membership directory and an extremely compact content index, and (2) a completely distributed content search and ranking algorithm that helps users find the most relevant information. PlanetP is a simple, yet powerful system for sharing information. PlanetP is simple because each peer must only perform a periodic, randomized, point-to-point message exchange with other peers. PlanetP is powerful because it maintains a globally content-ranked view of the shared data. Using simulation and a prototype implementation, we show that PlanetP achieves ranking accuracy that is comparable to a centralized solution and scales easily to several thousand peers while remaining resilient to rapid membership changes.
Chen Fu, Richard P. Martin, Kiran Nagaraja, David Wonnacott, Thu D. Nguyen, Barbara G. Ryder. Compiler-directed Program-fault Coverage for Highly Available Java Internet Services, In proc. of the The International Conference on Dependable Systems and Networks (DSN 2003), San Francisco, CA, June 2003
(PDF, 135 Kb)
Abstract: We present a new approach that uses compiler-directed fault-injection for coverage testing of recovery code in Internet services to evaluate their robustness to operating system and I/O hardware faults. We define a set of program-fault coverage metrics that enable quantification of Java catch blocks exercised during fault-injection experiments. We use compiler analyses to instrument application code in two ways: to direct fault injection to occur at appropriate points during execution, and to measure the resulting coverage. As a proof of concept for these ideas, we have applied our techniques manually to Muffin, a proxy server; we obtained a high degree of coverage of catch blocks, with, on average, 85 of the expected faults per catch being experienced as caught exceptions.
Kiran Nagaraja, Xiaoyan Li, Ricardo Bianchini, Richard P. Martin, Thu D. Nguyen, Using Fault Injection and Modeling to Evaluate the Performability of Cluster-Based Services, In Proc. of the 4th USENIX Symposium on Internet Technologies and Systems, Seattle, WA, March 2003.
Abstract: We propose a two-phase methodology for quantifying the performability (performance and availability) of cluster-based Internet services. In the first phase, evaluators use a fault-injection infrastructure to measure the impact of faults on the server's performance. In the second phase, evaluators use an analytical model to combine an expected fault load with measurements from the first phase to assess the server's performability. Using this model, evaluators can study the server's sensitivity to different design decisions, fault rates, and environmental factors. To demonstrate our methodology, we study the performability of 4 versions of the PRESS Web server against 5 classes of faults, quantifying the effects of different design decisions on performance and availability. Finally, to further show the utility of our model, we also quantify the impact of two hypothetical changes, reduced human operator response time and the use of RAIDs
Kiran Nagaraja , Neeraj Krishnan , Ricardo Bianchini , Richard P. Martin, , Thu D. Nguyen , Evaluating the Impact of Communication Architecture on the Performability of Cluster-Based Services, In Proc. of the Ninth International Symposium on High Performance Computer Architecture (HPCA-9), Anaheim, CA, February, 2003
(PDF, 281 Kbytes)
Abstract: We consider the impact of different communication architectures on the performability (performance + availability) of cluster-based servers. In particular, we use a combination of fault-injection experiments and analytic modeling to evaluate the performability of two popular communication protocols, TCP and VIA, as the intra-cluster communication substrate of a sophisticated Web server. Our analysis leads to several interesting conclusions, the most surprising of which is, under the same fault load, VIA-based servers deliver greater availability than TCP-based servers. If we assume higher fault rates for VIA-based servers because the underlying technology is more immature and programming model more complex, we find that packet errors or application faults would have to occur at approximately 4 times the rate in TCP-based servers before their performabilities equalize. We use our results from the study to suggest that high-performance and robust communication layers for highly available cluster-based servers should preserve message boundaries, as opposed to using byte streams, use single-copy transfers, pre-allocate channel resources, and report errors in manner consistent with the network fabric's fault model.
Kiran Nagaraja , Ricardo Bianchini , Richard P. Martin , Thu D. Nguyen , Using Fault Model Enforcement to Improve Availability, In Proceedings of the Second Workshop on Evaluating and Architecting System dependabilitY (EASY), San Jose, CA, October, 2002.
(PDF, 178 Kbytes)
Abstract: Today's network services run on complex arrays of computing systems consisting of a myriad of hardware and software components. In this work, we claim that it is impractical to try to tolerate all (or even a significant fraction of) fault types in these systems. We argue instead that a new approach, called fault model enforcement, that maps actual faults to expected faults of an abstract fault model can be used to increase system availability. This enforcement approach works because it transforms faults not factored into the initial design into faults that the system was designed to tolerate. Using fault injection and analytic modeling, we show that this enforcement approach has the potential to decrease the unavailability of a distributed Web server by over 50%.
Taliver Heath, Richard P. Martin, Thu D. Nguyen, Improving Cluster Availability Using Workstation Validation, In SIGMETRICS 2002 , Marina Del Rey, CA, June 2002
Abstract: We demonstrate a framework for improving the availability of cluster based Internet services. Our approach models Internet services as a collection of interconnected components, each possessing well defined interfaces and failure semantics. Such a decomposition allows designers to engineer high availability based on an understanding of the interconnections and isolated fault behavior of each component, as opposed to ad-hoc methods. In this work, we focus on using the entire commodity workstation as a component because it possesses natural, fault-isolated interfaces. We define a failure event as a reboot because not only is a workstation unavailable during a reboot, but also because reboots are symptomatic of a larger class of failures, such as configuration and operator errors. Our observations of 3 distinct clusters show that the time between reboots is best modeled by a Weibull distribution with shape parameters of less than 1, implying that a workstation becomes more reliable the longer it has been operating. Leveraging this observed property, we design an allocation strategy which withholds recently rebooted workstations from active service, validating their stability before allowing them to return to service. We show via simulation that this policy leads to a 70-30 rule-of-thumb: For a constant utilization, approximately 70% of the workstation failures can be masked from end clients with 30% extra capacity added to the cluster, provided reboots are not strongly correlated We also found our technique is most sensitive to the burstiness of reboots as opposed to absolute lengths of workstation uptimes.
Ankur Choksi Richard P. Martin, Badri Nath, Rahul Pupala, Mobility Support for Diffusion-based Ad-Hoc Sensor Networks, Department of Computer Science Technical Report DCS-TR-463, April 2002.
Abstract: In this work we investigate the application of four simple enhancements to directed diffusion to support mobile users. The first is a small change to diffusion which restarts path discovery when a node moves. The second is a simple handoff scheme similar to those used in cellular networks. The third introduces special proxy nodes as path anchors, and the fourth scheme provides hints to the diffusion layer to pre-create routes in an anticipated direction of motion. We compared the behavior of the various schemes for a mobile user tracking an object in a sensor network under four mobility scenarios. Our simulation results show that a combination of these simple enhancements to diffusion permit sink speeds of up 7 meters per second without generating excessive overhead. Increasing the transmit power was also an effective method of reducing the impact of mobility. Finally, we found that random motion is the most difficult mobility scenario, so approaches shown to work under random motion should also work well under more realistic mobility conditions.
Richard P. Martin, Kiran Nagaraja, Thu D. Nguyen, Using Distributed Data Structures for Constructing Cluster-Based Services, In Proceedings of the First Workshop on Evaluating and Architecting System dependabilitY (EASY), Gothenburg, Sweden, July 2001
(gzipped Postscript, 24Kbytes)
Abstract: We argue that the key to making the construction of large-scale, reliable services tractable is to center these systems around a few familiar data structures, such as trees, hashes and lists. To further ease the burden of service construction, these structures should exist in the framework of modern languages (such as Java) with type checking, garbage collection and exception handling. In our approach, system programmers would spend considerable effort maintaining a few simple abstractions embodied in the data structures. The service designer can then focus on the proper choice of structures, consistency levels, and their composition, as opposed the current method of stitching together a crazy quilt out of disparate software systems.
Taliver Heath, Richard P. Martin, Thu D. Nguyen, The Shape of Failure, In Proceedings of the First Workshop on Evaluating and Architecting System dependabilitY (EASY), Gothenburg, Sweden, July 2001
(gzipped Postscript, 130Kbytes)
Abstract: This work quantifies the statistical distribution of reboots for workstations. The Time-To-reBoot (TTB) is an important metric for service designers because it aggregates the failure and repair rates of many system components including the operating system, the workstation hardware, and the operator, into a single metric that can be associated to a component with well-defined interfaces. Our analysis shows the TTB for workstation clusters is best modeled as a wiebull distribution with a shape parameter <1. Our result implies that models using exponentially distributed times may not adequately describe workstation failures. In addition, a shape parameter < 1 means that workstations are best modeled as components that become more reliable with time, which has important ramifications for request distribution, data partitioning, and software rejuvenation.
Taliver Heath, Samian Kaur, Richard P. Martin, Thu D. Nguyen, Quantifying the Impact of Architectural Scaling on Communication, Proc of the Seventh International Symposium on High Performance Computer Architecture (HPCA-7), Monterrey, MX, Jan. 2001
(gzipped Postscript, 229Kbytes)
Abstract: This work quantifies how persistent increases in processor speed compared to I/O speed reduce the performance gap between specialized, high performance messaging layers and general purpose protocols such as TCP/IP and UDP/IP. The comparison is important because specialized layers sacrifice considerable system connectivity and robustness to obtain increased performance. We first quantify the scaling effects on small messages by measuring the LogP performance of two Active Message II layers, one running over a specialized VIA layer and the other over stock UDP as we scale the CPU and I/O components. We then predict future LogP performance by mapping the LogP model's network parameters, particularly overhead, into architectural components. Our projections show that the performance benefit afforded by specialized messaging for small messages will erode to a factor of 2 in the next 5 years. Our models further show that the performance differential between the two approaches will continue to erode without a radical restructuring of the I/O system. For long messages, we quantify the variable per-page instruction budget that a zero-copy messaging approach has for page table manipulations if it is to outperform a single-copy approach. Finally, we conclude with an examination of future I/O advances that would result in substantial improvements to messaging performance.
F. C.B. Wong, R. Martin, R. Arpaci-Dusseau, D. Culler, Architectural Requirements and Scalability of the NAS Parallel Benchmarks. Proc. of SC99 Conference on High Performance Networking and Computing , Portland, OR. Nov. 1999
AbstractWe present a study of the architectural requirements and scalability of the NAS Parallel Benchmarks. Through direct measurements and simulations, we identify the factors which affect the scalability of benchmark codes on two relevant and distinct platforms; a cluster of workstations and a ccNUMA SGI Origin 2000. We find that the benefit of increased global cache size is pronounced in certain applications and often offsets the communication cost. By constructing the working set profile of the benchmarks, we are able to visualize the improvement of computational efficiency under constant-problem-size scaling. We also find that, while the Origin MPI has better point-to-point performance, the cluster MPI layer is more scalable with communication load. However, communication performance within the applications is often much lower than what would be achieved by micro-benchmarks. We show that the communication protocols used by MPI runtime library are influential to the communication performance in applications, and that the benchmark codes have a wide spectrum of communication requirements.
Richard P. Martin,A Systematic Characterization of Application Sensitivity to Network Performance, UC Berkeley Technical Report CSD-99-1055, Sept. 1999.
(PDF-772Kbytes and Postscript-1417Kbytes)
R. Martin, D. Culler, NFS Sensitivity to High Performance Networks SIGMETRICS '99 Conference on the Measurement and Modeling of Computer Systems , Atlanta, GA. May 1999
Abstract: This work examines NFS sensitivity to performance characteristics of emerging networks. We adopt an unusual method of inserting controlled delays into live systems to measure sensitivity to basic network parameters. We develop a simple queuing model of an NFS server and show that it reasonably characterizes our two live systems running the SPECsfs benchmark. Using the techniques in this work, we can infer the structure of servers from published SPEC results. Our results show that NFS servers are most sensitive to processor overhead; it can be the limiting factor with even a modest number of disks. Continued reductions in processor overhead will be necessary to realize performance gains from future multi-gigabit networks. NFS can tolerate network latency in the regime of newer LANs and IP switches. Due to NFS's historic high mix of small metadata operations, NFS is quite insensitive to network bandwidth. Finally, we find that the protocol enhancements in NFS version 3 tolerate high latencies better than version 2 of the protocol.
Marc E. Fiuczynski , Richard P. Martin, Tsutomu Owa, Brian N. Bershad. On Using Intelligent Network Interface Cards to support Multimedia Applications. In The 8th International Workshop on Network and OperatingSystems Support for Digital Audio and Video (NOSSDAV 98), Cambridge, UK, July 1998.
Abstract: The emergence of fast, cheap embedded processors presents the opportunity for inexpensive processing to occur on the network interface. We are investigating how a system design incorporating such an intelligent network interface can be used to support streaming multimedia applications. We are developing an extensible execution environment, called SPINE, that enables applications to compute directly on the network interface and communicate with other applications executing on the host CPU, peer devices, and remote nodes. Using SPINE, we have implemented a video client that executes on the network interface, and transfers video data arriving from the network directly to the region of frame buffer memory representing the applications window. As a result of this system structure the video data is transferred only once over the I/O bus and places no load on the host CPU to display video at aggregate rates exceeding 80 Mbps
Randy Wang,Arvind Krishnamurthy, R. Martin, Tom Anderson David Culler. Modeling Communication Pipeline Latency. SIGMETRICS '98/ PERFORMANCE '98 Joint International Conference on Measurement and Modeling of Computer Systems , Madison, WI June 1998
Abstract: In this paper, we study how to minimize the latency of a message through a network that consists of a number of store-and-forward stages. This research is especially relevant for today's low overhead communication subsystems that employ dedicated processing elements for protocol processing. We develop an abstract pipeline model that reveals a crucial performance tradeoff. We subsequently exploit this tradeoff and present a series of fragmentation algorithms designed to minimize message latency. We provide an experimental methodology that enables the construction of customized pipeline algorithms that can adapt to the specific pipeline characteristics and application workloads. By applying this methodology to the Myrinet-GAM system, we have improved its latency by up to 51%. We also study the effectiveness of this technique for other realistic cases.
R. Martin, A. Vahdat,D. Culler, T. Anderson. Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture. International Symposium on Computer Architecture (ISCA) , Denver, CO. June 1997
Abstract: This work provides a systematic study of the impact of communication performance on applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased from 3 to 103 µs. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and per-message bandwidth, indicating that further improvements in communication performance will continue to improve application performance.
David Culler, Andrea Arpaci-Dusseau , Remzi Arpaci-Dusseau , Brent Chun, Steve Lumetta, Alan Mainwaring, Richard Martin, Chad Yoshikawa, Frederick Wong. Parallel Computing on the Berkeley NOW. JSPP'97 9th Joint Symposium on Parallel Processing, Kobe, Japan, May 1997
Abstract: The UC Berkeley Network of Workstations (NOW) project demonstrates a new approach to large-scale system design enabled by technology advances that provide inexpensive, low latency, high bandwidth, scalable interconnection networks. This paper provides an overview of the hardware and software architecture of NOW and reports on the performance obtained at each layer of the system; Active MEssages, MPI message passing, and benchmark parallel applications.
A. Dusseau , D. Culler, K. Schauser,R. Martin. Fast parallel sorting under LogP: experience with the CM-5. IEEE Transactions on Parallel and Distributed Systems,August 1996.
Abstract: In this paper, we analyze four parallel sorting algorithms (bitonic, column, radix, and sample sort) with the LogP model. LogP characterizes the performance of modern parallel machines with a small set of parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P). We develop implementations of these algorithms in Split-C, a parallel extension to C, and compare the performance predicted by LogP to actual performance on a CM-5 of 32 to 512 processors for a range of problem sizes. We evaluate the robustness of the algorithms by varying the distribution and ordering of the key values. We also briefly examine the sensitivity of the algorithms to the communication parameters.
We show that the LogP model is a valuable guide in the development of parallel algorithms and a good predictor of implementation performance. The model encourages the use of data layouts which minimize communication and balanced communication schedules which avoid contention. With an empirical model of local processor performance, LogP predictions closely match observed execution times on uniformly distributed keys across a broad range of problem and machine sizes. We find that communication performance is oblivious to the distribution of the key values, whereas the local processor performance is not; some communication phases are sensitive to the ordering of keys due to contention. Finally, our analysis shows that overhead is the most critical communication parameter in the sorting algorithms.
David Culler, Lok T. Liu, Richard P. Martin, Chad Yoshikawa, LogP Performance Assessment of Fast Network Inferfaces, IEEE Micro,Feburary 1996.
(Draft) Postscript (511Kbytes)
Abstract: We present a systematic performance assessment of the hardware and software that provides the interface between applications and emerging high-speed networks. Using LogP as a conceptual framework and Active Messages as the communication layer, we devise a set of communication microbenchmarks. These generate a graphical signature from which we extract the LogP performance parameters of latency, overhead, and bandwidth. The method is illustrated on three diverse platforms: Intel Paragon, Meiko CS-2, and a cluster of SparcStations with Myrinet. The study provides a detailed breakdown of the differences in communication performance among the platforms. While the details of our microbenchmark depend on Active Messages, the methodology can be applied to conventional communication layers
R. Martin . HPAM: An Active Message Layer for a Network of HP Workstations. In Hot Interconnects II, Aug. 1994
Abstract: This document describes an Active Message layer we have constructed on a Network of Workstations. Active Messages is a thin, highly optimized communication layer targeted at the library or compiler writer. A primary goal of an Active Message layer is to deliver the minimum latency and peak bandwidth of the network hardware to user programs. Previous work on Active Messages demonstrated an order of magnitude performance improvement over vendor supplied send and receive libraries for Massively Parallel Processors
The documents posted on this web page have been provided by the contributing authors to ensure timely dissemination of scholarly and technical work on a noncommercial basis. All rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically.