Chapter 2

Performance Measurement and Metrics

Abstract

A prerequisite to performance optimization is a means to accurately time portions of a code and subsequently describes how to use such timing information to assess code performance. In this chapter we first discuss how to time kernel execution using CPU timers, CUDA events, and the Command Line Profiler as well as the nvprof profiling tool. We then discuss how timing information can be used to determine the limiting factor of kernel execution. Finally, we discuss how to calculate performance metrics, especially related to bandwidth, and how such metrics should be interpreted.

Keywords

Timing; Performance metrics; CUDA events; Profiling; Bandwidth; Arithmetic throughput; Synchronization

A prerequisite ...

Get CUDA Fortran for Scientists and Engineers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.