
Solving sparse nonlinear systems of obstacle problems on GPU clusters 351
where 0 < y < (ny − 1) and 0 < z < (nz − 1), are executed first. Then, the
values associated to the bordering vector elements are exchanged between the
neighbors. Finally, the values of the vector elements associated to the border-
ing vector elements are updated. In this case, the computation of the local
vector elements is performed concurrently with the data exchanges between
neighboring CPUs and this in both synchronous and asynchronous cases.
In Table 14.3, we report the execution times and the number of relaxations
performed on a cluster of 12 GPUs by the parallel projected ...