VDHA: Vector-Driven Hash Aggregation for Sparse Matrix–Sparse Vector Multiplication on GPUs

摘要

Sparse matrix–sparse vector multiplication (SpMSpV) is a core primitive in graph analytics and scientific computing. On GPUs, the performance of the prevalent and highly efficient SpMSpV paradigm is often bottlenecked by the write-back phase of accumulating non-zero multiply–accumulate results; its many-to-one index scatter pattern causes severe conflicts and poor bandwidth utilization on GPUs. We present VDHA, a GPU-based weighted SpMSpV kernel that leverages block-private hash tables for local aggregation, substantially reducing write conflicts and improving memory coalescing. To further amplify this benefit, we incorporate column splitting with lightweight reordering to expose more locality, and employ a fetch–compute-writeback pipeline to overlap hash computation with memory accesses. Extensive evaluation on over 300 matrices with more than 5 million nonzeros, including web-scale graphs (Konect/LAW) and scientific workloads (SuiteSparse), shows that VDHA consistently outperforms state-of-the-art baselines. On web graphs, it achieves a 1.41× geometric-mean speedup (up to 3.42×), while on SuiteSparse it delivers 1.13× (up to 2.29×). We also provide a lightweight predictive model that identifies matrices favorable to VDHA with 91.3% accuracy.

出版物
Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
渠鹏
渠鹏
助理研究员