Fast hydrodynamics on heterogeneous many-core hardware 271
code produces correct results. An analysis of the numerical efficiency has also
been carried out on different GPU systems to identify comparative behaviors
as both the problems sizes and number of compute nodes vary. For example,
performance scalings on Test environment 1 and Test environment 3 are pre-
sented in Figure 11.5. The figure confirms that there is only a limited benefit
from using multiple GPUs for small problem sizes, since the computational
intensity is simply too low to efficiently hide the latency of message passing.
A substantial speedup is achieved compared to the single GPU version, while
being able to solve even larger systems. With the linear scaling of memory
requirements and