Chapter 2

Performance Measurement and Metrics


A prerequisite to performance optimization is a means to accurately time portions of a code and subsequently describes how to use such timing information to assess code performance. In this chapter we first discuss how to time kernel execution using CPU timers, CUDA events, and the Command Line Profiler as well as the nvprof profiling tool. We then discuss how timing information can be used to determine the limiting factor of kernel execution. Finally, we discuss how to calculate performance metrics, especially related to bandwidth, and how such metrics should be interpreted.


Timing; Performance metrics; CUDA events; Profiling; Bandwidth; Arithmetic throughput; Synchronization

A prerequisite ...

Get CUDA Fortran for Scientists and Engineers now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.