Chapter 7

Multidevice programming with OpenACC

Jiri Kraus NVIDIA GmbH, Würselen, DEU, Germany

Abstract

The purpose of this chapter is to explain how to program multiple OpenACC devices to work cooperatively on a single problem.

At the end of this chapter the reader will have a basic understanding of:

• How to program multidevice systems or accelerated clusters with OpenACC using a single host thread, OpenMP, or MPI

• Coordinate the work of multiple devices using a domain decomposition strategy

• How to use the async clause to overlap computation and MPI communication

• How to use the NVIDIA® tools for MPI+OpenACC applications

Multidevice programming; OpenACC; Domain decomposition; GPU; CUDA-aware MPI; Debugging; ...

Get Parallel Programming with OpenACC now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.