Chapter 4

Multi-GPU Programming

Abstract

This chapter covers how to write code that utilizes multiple GPUs. Although there are many possible configurations between host processes and devices one can use in multi-GPU code, this chapter focuses on two configurations: (1) a single host process with multiple GPUs using CUDA’s peer-to-peer capabilities introduced in the 4.0 Toolkit, and (2) using MPI, where each MPI process uses a separate GPU. As an example of each of these approaches, we implement peer-to-peer and MPI multi-GPU versions of the transpose example used in the previous chapter.

Keywords

Peer-to-peer; Unified virtual addressing (UVA); Direct transfer; Direct access; Message Passing Interface (MPI); Compute mode; NVIDIA System Management ...

Get CUDA Fortran for Scientists and Engineers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.