CS Events

PhD Defense

A GPU Binary Analysis Framework for Memory Performance and Safety

 

Download as iCal file

Monday, December 20, 2021, 12:00pm - 02:00pm

 

Speaker: Ari Hayes

Location : Via Zoom

Committee

Professor Zheng Zhang (advisor, chair)

Professor Ulrich Kremer

Professor Manish Parashar

Professor Chen Ding (external member, University of Rochester)

Event Type: PhD Defense

Abstract:  General-Purpose Graphics Processing Units (GPUs) have attained popularity for their massive concurrency. But with their relative infancy, as well as a prevalence of closed-source and proprietary technology, GPU software has not undergone the degree of optimization that CPU software has. In this work, we focus on NVIDIA's GPUs and the CUDA framework, but our techniques may be generalized to other GPU platforms. We develop a compiler which targets the low-level SASS assembly, rather than the high-level source code or the intermediate-level PTX assembly. This allows for a degree of tuning and modification not possible at higher levels. We reconstruct program information such as the control-flow graph and call graph, thereby permitting data flow analysis. Thanks to extensive reverse-engineering, our compiler retains compatibility with numerous versions of the CUDA framework and several generations of NVIDIA GPUs. We are able to improve memory performance with optimizations not available in the proprietary compiler. We perform memory-allocation across the multiple types of available on-chip memory, making full use of available resources including registers, scratchpad memory, and the L1 data cache. This further allows us to tune occupancy - the number of threads allowed to be simultaneously active. We perform static and dynamic occupancy tuning, in order to find an effective balance between concurrency and resource contention. Our compiler can also be applied toward improvement of memory safety. We use it to implement dynamic taint tracking, a technique previously used on CPUs to identify sensitive data as it spreads through memory. By analyzing and modifying the low-level assembly, we minimize tracking overhead, track memory resources not visible to the programmer, and erase sensitive data before it has opportunity to leak. We evaluate our compiler across a number of benchmarks on NVIDIA devices of the Fermi, Kepler, Maxwell, and Pascal architectures. We demonstrate that our resource allocation and occupancy tuning provide significant improvement in both speed and energy usage. We additionally show that our GPU-specific optimizations for taint tracking can enormously reduce overhead.

Organization

Rutgers University School of Arts and Sciences

Contact  Professor Zheng Zhang

Zoom Room:
https://rutgers.zoom.us/j/5460863055?pwd=bFozSnZkWkpxZ0Npc2VhS040Q0tKQT09

Meeting ID: 546 086 3055
Password: 215637