
Solving large sparse linear systems for integer factorization on GPUs 463
Sliced COO subformats Small Medium Large
Memory sharing No sharing Among warp Among block
Access method Direct Atomic XOR Atomic XOR
Bank conflict No No Yes
# Rows per Slice 12 192 6144
TABLE 20.2. Sliced COO subformat comparison (# rows per slices is based
on n = 64).
is no bank accessed by more than one thread. Thus, there is no bank conflict.
A p-reduction operation on shared memory is required to combine partial
results from each thread.
The maximum number of rows per slice is calculated as size of shared
memory per SM in bits / (number of threads per block * blocking factor).