Albert-Jan N. Yzelman; Dirk Roose; Karl Meerbergen KU Leuven, Belgium
The sparse matrix-vector (SpMV) multiplication is a very important kernel in scientific computing. Efficiently computing this kernel on modern architectures is difficult because of high bandwidth pressure and inefficient cache use. Despite the high available bandwidth on the Intel Xeon Phi, an efficient code remains difficult to achieve because of high data access latencies. We alleviate this issue by integrating vectorization into state-of-the-art parallel SpMV multiplication strategies.
We present a novel data structure that is a strict improvement on the industry-standard compressed ...
Get High Performance Parallelism Pearls Volume One now with O’Reilly online learning.
O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.