The purpose of this chapter is to explain how to program multiple OpenACC devices to work cooperatively on a single problem.
At the end of this chapter the reader will have a basic understanding of:
• How to program multidevice systems or accelerated clusters with OpenACC using a single host thread, OpenMP, or MPI
• Coordinate the work of multiple devices using a domain decomposition strategy
• How to use the async clause to overlap computation and MPI communication
• How to use the NVIDIA® tools for MPI+OpenACC applications