TaskProf Tutorial @ PLDI 2018
Debugging and Profiling Task Parallel Programs with TaskProf
Program
When: Monday, June 18, 2018 (AM)
Where: PLDI 2018, Philadelphia, PA, USA
Overview
Task Parallelism is an effective approach for writing performance portable parallel programs. In this model, the programmer expresses the parallelism in the program by specifying tasks and the task runtime dynamically maps these tasks to processors. The runtime balances the workload using work stealing algorithms. There are a number of languages and frameworks that support task parallelism (e.g., Intel Threading Building Blocks (TBB), Cilk, Habanero Java, X10, Microsoft Task Parallel Library, Java Fork/Join tasks, and the Rust language). This tutorial will present tools and techniques for performance and correctness debugging of task parallel programs written using the Intel Thread Building Blocks library.
This tutorial will educate participants on identifying parallelism bottlenecks in task parallel programs. A common metric used to quantify the performance of a task parallel program is asymptotic parallelism, which measures the potential speedup when the program is executed on a large number of processors. It is constrained by the longest chain of tasks that must be executed sequentially (also known as the span or the critical work). Hence, asymptotic parallelism is the ratio of the total work and the critical work performed by the program for a given input. A scalable program must have large asymptotic parallelism. A task parallel program can have low asymptotic parallelism due to multiple factors: coarse-grained tasks, limited work performed by the program, and secondary effects of execution such as contention, low locality, and false sharing.
In this tutorial, we will introduce TaskProf, a profiler that measures asymptotic parallelism in task parallel programs for a given input. TaskProf computes an accurate parallelism profile by performing fine-grained attribution of work to various parts of the program using the structure of a task parallel execution. We will also highlight what-if analyses in TaskProf. TaskProf's what-if profile allows users to estimate improvements in parallelism when regions of code are optimized, even before concrete optimizations for them are known. Finally, we will have a hands-on exercise where participants use TaskProf to identify parallelism bottlenecks, estimate the improvement from optimizing the bottlenecks, and design concrete optimizations to obtain performance improvements.
Agenda
Part I: Introduction to task parallelism and challenges in writing performance portable programs
Part II: Core algorithms used for parallelism profiling and what-if analyses
Part III: Demo of profiling with TaskProf
Part IV : Demo of bottleneck identification with TaskProf
Part V : Hands-on session with TaskProf to find bottlenecks with two TBB applications
Prerequisites
Familiarity with C++ and interest in performance analysis of parallel programs.
Artifacts for the Tutorial
Takeaways
- Measurement of inherent parallelism in a task parallel program
- Construction of series-parallel relationships and fine-grained measurements with task parallel programs
- What-If analyses to identify serialization bottlenecks
- Building debugging tools using series-parallel relationships
Related Publications
- Adarsh Yoga,and Santosh Nagarakatte - A Fast Causal Profiler For Task Parallel Programs - FSE 2017.
- Adarsh Yoga, Santosh Nagarakatte, and Aarti Gupta - Parallel Data Race Detection for Task Parallel Programs with Locks - FSE 2016.