Performance Profilers and Debugging Tools for OpenMP programs
Thursday, May 09, 2019, 10:00am
OpenMP is a popular application programming interface (API) used to write shared memory parallel programs. It supports a wide range of parallel constructs that allow expressing different types of parallelism, including fork-join and task-based parallelism. Using OpenMP, a developer can parallelize a program by incrementally adding parallelism to it until their performance goals are met.
In this dissertation, we address the problem of assisting developers in meeting the two major goals of writing parallel programs in OpenMP: (1) Performance (2) Correctness. First, writing OpenMP programs that achieve scalable performance is challenging. An OpenMP program that achieves reasonable speedup on a low core count system may not achieve scalable speedup when ran on a system with a larger number of cores. Traditional profilers report program regions where significant serial work is performed. In a parallel program, optimizing such regions may not improve performance since it may not improve the program's parallelism. To address this problem, we introduce OMP-WHIP, a parallelism profiler for OpenMP programs with what-if analyses. Using our novel performance model, OMP-WHIP identifies serialization bottlenecks by measuring inherent parallelism in the program and its OpenMP directives. OMP-WHIP’s what-if analyses enable developers to estimate the increase in parallelism in user-specified regions of code before designing concrete optimizations. Thus assisting developers in identifying regions that have to be optimized first to achieve scalable speedup. While a lack of inherent parallelism is a sufficient condition for a program to not have scalable speedup, too much parallelism can lead to excessive runtime and scheduling overheads, leading to lower performance. We address this issue by extending OMP-WHIP to measure tasking overheads. By attributing this additional information to different OpenMP directives in the program, OMP-WHIP can identify tasking cut-offs that achieve the right balance between a program's parallelism and its scheduling overheads for a given input.
Second, writing correct parallel programs are challenging due to the possibility of bugs such as deadlocks, livelocks, and race conditions that don’t occur when writing a serial program. A data race occurs when two parallel fragments of the program access the same memory location while one of the accesses is a write. Data races are a common cause of bugs in OpenMP programs. Manually identifying and reproducing data races is challenging due to nondeterminism caused in programs which they occur. We introduce OMPRace, a dynamic data race detector for OpenMP. OMPRace uses our novel OpenMP Series-Parallel Graph (OSPG) to identify the logical series-parallel relation in different fragments of the program. OMPRace constructs a program's OSPG on-the-fly and uses it to identify series-parallel relations in the program to detect data races, enabling OMPRace to detect races in all possible schedules for a given program an a given input. Compared to the state-of-the-art, OMPRace can correctly identify races in a larger subset of OpenMP that use task-dependencies and locks, while achieving similar overheads.
Our results with testing the OMP-WHIP and OMPRace prototypes with more than 45 OpenMP applications and benchmarks indicate their respective effectiveness in performance profiling and data race detection. Furthermore, it demonstrates the usefulness of our novel OSPG abstraction in enabling different types of analyses for OpenMP programs.
Speaker: Nader Boushehrinejad
Location : CoRE 305
Prof Santosh Nagarakatte (Chair), Prof Badri Nath, Prof Srinivas Narayana, Prof Martha Kim (Columbia University)
Event Type: Pre-Defense
Computer Science Department