June 2013
Intermediate to advanced
528 pages
13h 11m
English
This chapter describes CUDA’s facilities for multi-GPU programming, including threading models, peer-to-peer, and inter-GPU synchronization. As an example, we’ll first explore inter-GPU synchronization using CUDA streams and events by implementing a peer-to-peer memcpy that stages through portable pinned memory. We then discuss how to implement the N-body problem (fully described in Chapter 14) with single- and multithreaded implementations that use multiple GPUs.
Systems with multiple GPUs generally contain multi-GPU boards with a PCI Express bridge chip (such as the GeForce GTX 690) or multiple PCI Express slots, or both, as described in Section 2.3. Each GPU in such a system is separated by PCI Express ...