SIMD Acceleration of SpMV Kernel on Multi-core CPU Architecture
Abstract
As the field of Sparse Matrix vector multiplication (SpMV) matures and its breadth of application increases, the need for parallel implementation becomes necessary. SpMV is proved to be a bottleneck due to its irregular access patterns. Various storage formats for sparse matrices have been proposed to solve this issue. The Quad tree-Compressed Row Storage (QCSR) format shows good performance improvement over other formats such as Compressed Row Storage (CSR) and Blocked CSR (BCSR) for SpMV. This paper extends QCSR format to exploit the Single Instruction Multiple Data (SIMD) registers which are available in current processors. Programming with SIMD registers in a single core processor achieves parallelism with reduced power and without any additional hardware requirement, as in the case of Graphics Processing Unit (GPU) computing. To program effectively in SIMD units Intel’s Streaming SIMD Extension (SSE) instructions are used. In this paper computational performance of QCSR-SpMV is determined for over a collection of 10 benchmark matrices on SIMD units of X86 architecture. Experimental results demonstrate QCSR-SIMD achieves significant average speedup of 2.0 x compared to CSR-SIMD.