
140 Designing Scientific Applications on GPUs
loads are transferred between neighboring nodes [8]. In such a case, the load
evaluation and the comparison with other nodes can be done in parallel with
the main computations without perturbing them.
7.4 Perspective: a unifying programming model
In the previous sections we have seen that controlling a distributed GPU
application when using tools that are commonly available is quite a challenging
task. To summarize, such an application has components that can be roughly
classified as
CPU: CPU-bound computations, realized as procedures in the chosen pro-
gramming language
CUDA
kern
: GPU-bound computations, in