Get Parallel Programming with OpenACC now with O’Reilly online learning.
O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.
NVIDIA GmbH, Würselen, DEU, Germany
The purpose of this chapter is to explain how to program multiple OpenACC devices to work cooperatively on a single problem.
At the end of this chapter the reader will have a basic understanding of:
• How to program multidevice systems or accelerated clusters with OpenACC using a single host thread, OpenMP, or MPI
• Coordinate the work of multiple devices using a domain decomposition strategy
• How to use the async clause to overlap computation and MPI communication
• How to use the NVIDIA® tools for MPI+OpenACC applications
Multidevice programming; OpenACC; Domain decomposition; GPU; CUDA-aware MPI; Debugging; ...