O'Reilly logo

CUDA Fortran for Scientists and Engineers by Massimiliano Fatica, Gregory Ruetsch

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3

Optimization

Abstract

This chapter goes through the steps that one typically takes to optimize a CUDA Fortran application. Reducing the overhead of transferring data between the host and device is discussed first in terms of reducing the amount of data transferred and making such transfers efficient and possible. Once the data are on the device, we discuss at length how to efficiently access these data from kernels, including topics of global memory data coalescing, use of on-chip shared memory, and the read-only constant and texture memories. The topic of launching kernels with enough parallelism, whether in the form of instruction-level or thread-level parallelism, is also discussed. The final section covers instruction optimization. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required