Bank conflicts and its effect on shared memory

Good speedup compared to using global memory does not necessarily imply that we are using shared memory effectively. This becomes clearer if we look at the profiler metrics. If we shift from guided analysis to unguided analysis for the profiler output, that is, matrix_transpose.prof, we will see that the shared memory access pattern shows alignment problems, as shown in the following screenshot:

We can see how the profiler shows nonoptimal usage of shared memory, which is a sign of a bank conflict.

To effectively understand this alignment problem, it is important to understand the concept of ...

Get Learn CUDA Programming now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.