Software components for heterogeneous many-core architectures 93
algorithms, that are well suited for such systems, forcing developers to search
for novel methods that utilize concurrency.
To ease software development, we use MPI-2 for message passing and en-
sure a safe and private communication space by creation of a communicator
private to the library during initialization, as recommended by Hoefler and
Snir [12]. With the addition of remote direct memory access (RDMA) for
GPUDirect it is possible to make direct memory transfers between recent gen-
eration of GPUs (Kepler), eliminating CPU overhead. Unfortunately there are
some strict system and driver requirements to enable these features. Therefore,
in the following examples, device memory is first ...