CS Events

PhD Defense

FAST METHODS TO DETECT AND DEBUG NUMERICAL ERRORS WITH SHADOW EXECUTION

 

Download as iCal file

Tuesday, July 19, 2022, 10:00am

 

Speaker: Sangeeta Chowdhary

Location : Virtual

Committee

Professor Santosh Nagarakatte (Chair)
Professor Richard Martin 
Professor Mridul Aanjaneya 
Professor Sreepathi Pai (University of Rochester)

Event Type: PhD Defense

Abstract: The floating-point (FP) representation uses a finite number of bits to approximate real numbers in computer systems. Due to rounding errors, arithmetic using the FP representation may diverge from arithmetic with real numbers. For primitive FP operations, the rounding error is small and bounded. However, with the sequence of FP operations, rounding errors can accumulate. Such rounding errors can be magnified by certain operations. The magnified error can affect the program's control flow and the output compared to an execution with infinite bits of precision. It is challenging for users to identify such bugs, as in such cases the program does not crash but generates the incorrect output. Without any oracle in high precision, it is hard to differentiate between correct and incorrect output. Detecting such bugs in long-running programs becomes even more challenging. This dissertation proposes a fast, yet precise mechanism to detect and debug numerical errors in long-running programs. This dissertation makes the following contributions. First, we propose a selective shadow execution framework to detect and debug numerical errors. Our idea is to use shadow execution with high-precision computation for comprehensive numerical error detection. The FP variable is shadowed with a high precision value in this approach. On every FP computation, an equivalent high precision computation is performed. If there is a significant difference between FP computation and high precision computation, the error is reported to the user. To debug errors, we maintain the additional information about the instructions. We use additional information to generate the directed acyclic graph (DAG) of instructions showing the error propagation. The DAG of instructions helps the user identify the root cause of the error. Our prototype FPSanitizer for floating-point is an order of magnitude faster than the prior work. Second, we propose a novel technique to run shadow execution in parallel to reduce the overheads further. In our approach, the user specifies parts of the program that need to be debugged. Our compiler creates shadow execution tasks that mirror these specified regions in the original program but perform equivalent high precision computation. To execute the shadow tasks in parallel, we need to break the dependency between them by providing the appropriate memory state and input arguments. Moreover, to correctly detect the numerical errors in the original program, shadow tasks need to follow the same control flow as in the original program. Our key insight is to use FP values computed by the original program to start the shadow task from some arbitrary point. To ensure shadow tasks follow the same control flow as the original program, our compiler updates every branch instruction in the shadow task to use the branch outcomes of the original program. As a result, the original program and shadow tasks execute in a decoupled fashion and communicate via a non-blocking queue. Our prototype PFPSANITIZER is significantly faster than the FPSANITIZER.Finally, we propose an alternative lightweight oracle to reduce the overheads of shadow execution. Although parallel shadow execution effectively reduces overheads, it requires user input. Often, users may not know where the numerical bugs are present. This thesis proposes a fast shadow execution framework, EFTSANITIZER, that uses error-free transformations (EFTs) to detect anddebug numerical bugs comprehensively. EFTs are a set of algorithms that provide a mechanism to capture the rounding error using primitive FP instructions. For certain FP computations rounding error can be represented as an FP value. Based on this observation, EFTs transform FP computation to derive the rounding error. In our approach, we maintain the error with each memory location and compute the error for a sequence of FP operations using error composition with EFTs. EFTSANITIZER provides a trace of instructions that help users isolate and debug the root cause of the errors. In addition, EFTSANITIZER is an order of magnitude faster than FPSANITIZER.

Organization

Department of Computer Science

School of Arts & Sciences

Rutgers University

Contact  Professor Santosh Nagarakatte

Zoom: https://rutgers.zoom.us/j/97290959498?pwd=YXN1cS9RZ044U0ltMUJLbDU1d2NMZz09

Meeting ID: 972 9095 9498
Password: 479636