CS Events

PhD Defense

HARDWARE-SOFTWARE TECHNIQUES FOR ACCELERATING SPARSE COMPUTATION

 

Download as iCal file

Tuesday, July 12, 2022, 10:00am

 

Speaker: Mohammadreza Soltaniyeh

Location : Virtual

Committee

Professor Santosh Nagarakatte (Chair)

Professor Richard P. Martin

Professor Yipeng Huang

Professor Joe DeviettiĀ 

Event Type: PhD Defense

Abstract: Linear algebra kernels are widely used in various fields such as machine learning, data science, physical science, and graph analysis. Many of these applications work with sparse data (i.e., only a small fraction of data is non-zero). Sparse data are often stored in a compressed format (i.e., sparse format) that stores only the non-zero elements with additional metadata to identify where the non-zero elements are located. Using compressed formats eliminates the need to store and process zeros, making the storage and computation of sparse kernels more efficient.General purpose architectures, such as CPUs and GPUs, are not able to deliver the same performance for sparse linear algebra kernels as they do for dense versions. First, accessing non-zero elements in sparse format introduces many indirect and irregular memory accesses incompatible with SIMD and caching mechanisms used by CPUs and GPUs. In addition, Dennard scaling is obsolete and Moore's law is slowing down, ending the era in which general-purpose architectures become faster and more energy efficient transparently. This has led to a plethora of research into developing specialized hardware, such as FPGAs and ASICs to improve the performance and energy efficiency of these sparse kernels. A key strategy for the specialized hardware is to customize the sparse format (i.e., storage) according to the operation memory access pattern, the pattern of non-zero elements in the input (i.e., sparsity pattern), and the underlying hardware structures. This approach is effective if the operations and input sparsity patterns do not change. However, applications often perform various operations on sparse data. Additionally, the sparse inputs may frequently change for each execution, and each input may have a different sparsity pattern. When this happens, the performance of specialized hardware degrades because a reformatting step is required to convert the data into a format that is compatible with the hardware. The data reformatting can be expensive when it cannot be overlapped with the computation on the hardware or amortized over multiple application executions with the same input data. This dissertation presents a few hardware-software techniques that enhance the performance and energy efficiency of some of the most important sparse problems, including sparse matrix-vector multiplication (SpMV), sparse general matrix-matrix multiplication (SpGEMM), and sparse convolutional neural networks (CNNs). The key insight of our method is to use the software to reformat the sparse data into a hardware-friendly format, allowing the hardware to perform the computation with a high degree of parallelism. The software improves design flexibility by supporting multiple sparse formats, and the hardware improves performance and energy efficiency. We applied these hardware-software techniques to SpMV, SpGEMM, and sparse CNNs. These problems have different characteristics, such as different input densities and distinct input sparsity pattern features. The contribution of this dissertation can be summarized as follows. First, we present a synergistic CPU-FPGA system to accelerate SpMV and SpGEMM kernels. In our proposed design, the CPU reorganizes sparse data into a format suitable for the FPGA, and the FPGA computes with high parallelism using the preprocessed data. We develop an intermediate representation that allows the software to communicate regularized data and scheduling decisions to the FPGA. Besides, most of the CPU and FPGA execution are overlapped. Our approach can effectively handle sparse kernels with low input densities and sparsity patterns varying for each sparse input. Second, we present a hardware accelerator for sparse CNN inference tasks. We formulate the convolution operation as general matrix-matrix multiplication (GEMM) using an image to column (IM2COL) transformation. With a dynamically reconfigurable GEMM and a novel IM2COL hardware unit, our design can support various layers in CNNs with high performance. Besides, our design exploits sparsity in both weights and feature maps. We use the software to perform group-wise pruning followed by a preprocessing step that puts the pruned weights into our hardware-friendly sparse format for efficient and high performance computation. We built an ASIC and an FPGA prototype of our accelerator. Our design is faster and more energy efficient than CPUs and GPUs for sparse CNNs.

Organization

Department of Computer Science

School of Arts & Sciences

Rutgers University

Contact  Professor Santosh Nagarakatte

Zoom: https://rutgers.zoom.us/j/95080066950?pwd=WHU0NG02b3k2L0tLTUEwaVQ3ZFlYQT09