Chapter 3: Performance measurement and metrics

Abstract

After establishing the correctness of a CUDA Fortran code, we need to obtain accurate performance measurements and a good understanding of how such metrics should be interpreted to identify performance bottlenecks. In this chapter, we discuss several methods of obtaining these performance measurements using CPU timers, CUDA events, and profiling tools such as NVIDIA Nsight Compute in command-line and graphical modes, as well as a Fortran module that provides some interfaces to the NVIDIA Tools Extension (NVTX) library, which can be used to customize profiling. We then discuss how timing information can be used to determine the limiting factor of kernel execution. Finally, we discuss how to ...

Get CUDA Fortran for Scientists and Engineers, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.