Regu2D: Accelerating Vectorization of SpMV on Intel Processors through 2D-partitioning and Regular Arrangement

Abstract

Sparse matrix-vector multiplication (SpMV) is an elementary kernel of many high-performance computing (HPC) applications, and it is often one of the performance bottlenecks of them. Accelerating SpMV on vector processors usually faces several issues including irregular data accesses, memory bandwidth limitation, and the short vector problem. Based on a detailed analysis of the effects and interactions of various technologies introduced by state-of-the-art studies (ALBUS, CVR, CSR5, SELL-C-σ etc.), we propose Regu2D, a comprehensive solution to accelerate vectorization of SpMV through three methods: adaptive 2D-partitioning, the regular arrangement of matrix elements, and indices compression. Dynamic programming algorithms are used to optimize the first two methods. We conduct experiments on Intel Xeon processors (Skylake architecture) which support AVX-512 SIMD instructions and use sparse matrices from the University of Florida Sparse Matrix Collection. Experiments show that Regu2D achieves an average speedup of 1.69X, 1.93X, 1.40X, and 1.20X over ALBUS, CVR, CSR5, and SELL-C-σ for 30 scale-free sparse matrices, respectively. For 16 HPC sparse matrices, Regu2D achieves an average speedup of 1.34X, 1.89X, 1.34X, and 1.50X over them, respectively.

Publication
In Proceedings of the 50th International Conference on Parallel Processing