
Numerical validation and GPU performance in atomic physics 391
technique may also be used to overlap such transfers with computation on the
GPU.
16.6 Propagation of multiple concurrent energies on
GPU
Finally, we present here an improvement that can benefit from the Fermi
architecture, as well as from the newest Kepler architecture, both of which
enable the concurrent execution of multiple CUDA kernels, thus offering ad-
ditional speedup on GPUs for small or medium computation grain kernels.
In our case, the performance gain on the GPU is indeed limited since most
matrices have small or medium sizes. By using multiple streams within one
CUDA context [8], ...